You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by kf...@collab.net on 2005/02/16 18:27:52 UTC

Re: "Malformed XML" (l10n?) problems on checkout

Charles Bailey <ba...@newman.upenn.edu> writes:
> I've recently started using svn (1.1.3 built under Mac OS X 10.3), and
> in early testing have tripped over a problem this error:
> 
> svn: Malformed XML: not well-formed (invalid token) at line 5988

This sounds like a problem in a .svn/entries or .svn/log file
somewhere in your working copy, perhaps due to a bug whereby
Subversion may be failing to properly escape some XML-sensitive
filenames or something.

If you can narrow this down to a data set that still reproduces the
bug, but that you'd be willing to share, that'd be terrific.  Failing
that, if you can find the .svn/entries or .svn/log file with the
problem, and show it to us, that would be great.

Thanks for the report,
-Karl

> Here're the details:
> 
> 0. Repository created with "svnadmin create"; no mods other than permissions
> 1. Populated from client machine (with a bunch of prefs files) using
>       svn import <path> svn+ssh://<server>/<rpath>
>    with LANG and LC_CTYPE set to "en_US.UFT-8" on both machines
> (without which
>    I see "can't recode" errors); this succeeds without complaint
> 2. Attempt to retrieve files back to client using
>       svn co svn+ssh://<server>/<rpath> <new-local-root>
>    grinds away for a while, then falls over with
>       A  <new-local-root>/Library/Preferences/Deneba/User Dictionary
>       A  <new-local-root>/Library/Preferences/Deneba/CVULg.set
>       A  <new-local-root>/Library/Preferences/Deneba/CVAppData.set
>       svn: Malformed XML: not well-formed (invalid token) at line 5988
> 3. If I cd down into .../Deneba and run "svn cleanup", it succeeds,
> but trying to
>    cleanup .../Preferences yields
>       svn: In directory ''
>       svn: Can't copy '.svn/tmp/text-base/TrueType Font Editor
> Prefs.svn-base' \
>       to 'TrueType Font Editor Prefs.4.tmp': No such file or directory
> 
> Since Googling for the "Malformed XML" error yields some mention of
> Unicode issues, and I'd stumbled over svn's requirement for custom
> LANG/LC_CTYPE settings above, I wondered whether this was also a
> Unicode-related problem. There are a couple files in the repository
> with non-ASCII characters in their names, but none of them are close
> (in collating sequence) to the last files checked out, and a scan of
> the .../Preferences/.svn/log* doesn't turn up any problems with these
> files.  FWIW, "svn list svn+ssh://<server>/<rpath>" also works fine.
> 
> If this is pure thick-headedness on my part, pointers to TFM/TFMLA
> happily accepted.  If not, I'd appreciate any hints as to where I
> might best look next.  I'm willing to give the current HEAD version a
> try, too, if Knowledgeable Folks think the issue's likely been fixed
> already.
> 
> Thanks.
> 
> 
> --
> Regards,
> Charles Bailey  < bailey _at_ newman _dot_ upenn _dot_ edu >
> Newman Center at the University of Pennsylvania
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "Malformed XML" (l10n?) problems on checkout

Posted by Julian Reschke <ju...@gmx.de>.
Dale Worley wrote:
>>From: Charles Bailey [mailto:bailey@newman.upenn.edu]
> 
> 
>>Sure.  I think I've got it.  By process of elimination, the
>>offending file
>>seems to be one named '.ooo^H^H^Htestff^Hile' (where ^H is
>>the usual \x08
>>backspace character).  (No, I've no idea why the creator of
>>this file --
>>likely OpenOffice.org -- chose to use this name.)
> 
> 
> *Why* that file name exists is clear -- someone was trying to type a file
> name into a box.  He typed ".ooo", and then decided he didn't like that, so
> he typed ^H three times, which moved the cursor back three spaces, then he
> typed "test", which wrote over the offending "ooo", but didn't actually
> remove them from the program's input buffer.  Similarly, the final ^H was to
> correct the second "f" so he could replace it with "i".  The name he thought
> he was getting was ".testfile".
> 
> But you've identified the Subversion problem correctly -- a file name can
> contain "non-printable" characters, which are forbidden in XML.  Worse, what
> you might think is a valid escape sequence to represent it -- &8; -- is also
> forbidden in XML, because "character entities" are forbidden from
> representing non-printable characters.  See the discussion in
> 
> http://www.w3c.org/TR/2004/REC-xml-20040204/#dt-charref
> 
> Subversion may need to extend XML to allow this (and has to verify that its
> XML parser can deal with it).

No. Please. Do not "extend" XML. Extending it means breaking existing 
clients that use conforming XML 1.0 parsers.

The proper way (WebDAV does that, so why not Subversion in other 
places?) is to use URIs rather than names (they will never contain 
control characters).

Best regards, Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: "Malformed XML" (l10n?) problems on checkout

Posted by Charles Bailey <ba...@newman.upenn.edu>.
--On February 16, 2005 10:44:17 PM -0600 kfogel@collab.net wrote:

> Charles Bailey <ba...@newman.upenn.edu> writes:
>> Two quick questions:
>> - Is it worth a small patch to xml.c to include the contents of the
>> offending buffer in a "Malformed XML" error message?  It could yield a
>> long/multiline message, but would help identify the offending text in
>> a large operation.
>
> Hmmm.  That's tough.  How big can the buffer get?  (It would do us no
> good to just display part of it, since we don't know where the problem
> characters would be.)  We could have a heuristic that we display the
> buffer if it's below a certain size, and don't if it's above.
>
> Are you, um, volunteering? :-)

Sure.  I just didn't want to take the time if there was already a 
no-messages-longer-than-60-chars policy in place.

>> - Is it worth adding a bit of text about path name requirements to the
>> docs?  Would it go in the Book (e.g. the UTF/Path requirements bit of
>> the developer info, or even a brief blurb about character sets in Ch
>> 1), or elsewhere?
>
> Yes, that patch would be uncontroversially welcome, I think!

OK, let me see if I can cons up a little sidebar on character sets for 
somewhere in Ch 1 or 2 of the Book.

--
Regards,
Charles Bailey  < bailey _at_ newman _dot_ upenn _dot_ edu >
Newman Center at the University of Pennsylvania

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "Malformed XML" (l10n?) problems on checkout

Posted by Charles Bailey <ba...@newman.upenn.edu>.
--On February 16, 2005 10:44:17 PM -0600 kfogel@collab.net wrote:

> Charles Bailey <ba...@newman.upenn.edu> writes:
>> Two quick questions:
>> - Is it worth a small patch to xml.c to include the contents of the
>> offending buffer in a "Malformed XML" error message?  It could yield a
>> long/multiline message, but would help identify the offending text in
>> a large operation.
>
> Hmmm.  That's tough.  How big can the buffer get?  (It would do us no
> good to just display part of it, since we don't know where the problem
> characters would be.)  We could have a heuristic that we display the
> buffer if it's below a certain size, and don't if it's above.
>
> Are you, um, volunteering? :-)

Sure.  I just didn't want to take the time if there was already a 
no-messages-longer-than-60-chars policy in place.

>> - Is it worth adding a bit of text about path name requirements to the
>> docs?  Would it go in the Book (e.g. the UTF/Path requirements bit of
>> the developer info, or even a brief blurb about character sets in Ch
>> 1), or elsewhere?
>
> Yes, that patch would be uncontroversially welcome, I think!

OK, let me see if I can cons up a little sidebar on character sets for 
somewhere in Ch 1 or 2 of the Book.

--
Regards,
Charles Bailey  < bailey _at_ newman _dot_ upenn _dot_ edu >
Newman Center at the University of Pennsylvania

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: "Malformed XML" (l10n?) problems on checkout

Posted by kf...@collab.net.
"Dale Worley" <dw...@pingtel.com> writes:
> It would seem to me that if svn was to generate "Malformed XML" messages, it
> should do so like a good compiler generates messages, mentioning fairly
> narrowly the nature of the violation, the file (or whatever) within which
> the violation was found, and the line/character offset at which the
> violation was found.  If it is dynamically created XML passed by the server
> or some such, svn should save it into a file, and tell the user the name of
> the file (as is now done with commit messages upon failed commits), so the
> user can attach it to the bug report.
> 
> Yeah, that's more than a bit of work, but good error reporting takes work,
> and is almost always worth it.

I totally agree, and I love reviewing patches... :-)

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: "Malformed XML" (l10n?) problems on checkout

Posted by Dale Worley <dw...@pingtel.com>.
> From: kfogel@newton.ch.collab.net
>
> Charles Bailey <ba...@newman.upenn.edu> writes:
> > Two quick questions:
> > - Is it worth a small patch to xml.c to include the contents of the
> > offending buffer in a "Malformed XML" error message?  It
> could yield a
> > long/multiline message, but would help identify the
> offending text in
> > a large operation.
>
> Hmmm.  That's tough.  How big can the buffer get?  (It would do us no
> good to just display part of it, since we don't know where the problem
> characters would be.)  We could have a heuristic that we display the
> buffer if it's below a certain size, and don't if it's above.

It would seem to me that if svn was to generate "Malformed XML" messages, it
should do so like a good compiler generates messages, mentioning fairly
narrowly the nature of the violation, the file (or whatever) within which
the violation was found, and the line/character offset at which the
violation was found.  If it is dynamically created XML passed by the server
or some such, svn should save it into a file, and tell the user the name of
the file (as is now done with commit messages upon failed commits), so the
user can attach it to the bug report.

Yeah, that's more than a bit of work, but good error reporting takes work,
and is almost always worth it.

Dale


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "Malformed XML" (l10n?) problems on checkout

Posted by kf...@collab.net.
Charles Bailey <ba...@newman.upenn.edu> writes:
> Two quick questions:
> - Is it worth a small patch to xml.c to include the contents of the
> offending buffer in a "Malformed XML" error message?  It could yield a
> long/multiline message, but would help identify the offending text in
> a large operation.

Hmmm.  That's tough.  How big can the buffer get?  (It would do us no
good to just display part of it, since we don't know where the problem
characters would be.)  We could have a heuristic that we display the
buffer if it's below a certain size, and don't if it's above.

Are you, um, volunteering? :-)

> - Is it worth adding a bit of text about path name requirements to the
> docs?  Would it go in the Book (e.g. the UTF/Path requirements bit of
> the developer info, or even a brief blurb about character sets in Ch
> 1), or elsewhere?

Yes, that patch would be uncontroversially welcome, I think!

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: "Malformed XML" (l10n?) problems on checkout

Posted by kf...@collab.net.
Charles Bailey <ba...@newman.upenn.edu> writes:
> Two quick questions:
> - Is it worth a small patch to xml.c to include the contents of the
> offending buffer in a "Malformed XML" error message?  It could yield a
> long/multiline message, but would help identify the offending text in
> a large operation.

Hmmm.  That's tough.  How big can the buffer get?  (It would do us no
good to just display part of it, since we don't know where the problem
characters would be.)  We could have a heuristic that we display the
buffer if it's below a certain size, and don't if it's above.

Are you, um, volunteering? :-)

> - Is it worth adding a bit of text about path name requirements to the
> docs?  Would it go in the Book (e.g. the UTF/Path requirements bit of
> the developer info, or even a brief blurb about character sets in Ch
> 1), or elsewhere?

Yes, that patch would be uncontroversially welcome, I think!

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "Malformed XML" (l10n?) problems on checkout

Posted by Charles Bailey <ba...@newman.upenn.edu>.
--On February 16, 2005 3:22:33 PM -0600 kfogel@collab.net wrote:

> "Dale Worley" <dw...@pingtel.com> writes:
>>
>> But you've identified the Subversion problem correctly -- a file name can
>> contain "non-printable" characters, which are forbidden in XML.  Worse,
>> what you might think is a valid escape sequence to represent it -- &8;
>> -- is also forbidden in XML, because "character entities" are forbidden
>> from representing non-printable characters.  See the discussion in
>>
>> http://www.w3c.org/TR/2004/REC-xml-20040204/#dt-charref
>>
>> Subversion may need to extend XML to allow this (and has to verify that
>> its XML parser can deal with it).
>
> Thanks for the analysis and the reference, Dale.  You're right, and
> Subversion's way of dealing with this is that (now) it no longer
> permits such paths in the repository.  See issue #1954, and see r12581
> and r12632.  There is a long thread on the topic, linked to from the
> issue I believe, that explains the reasoning behind the decision.

Thanks; that's quite helpful.  I'm sorry to have missed it in the archive. 
I was too focused on the specific error, and should have searched for any 
XML-related issue.

> So Charles, your solution right now is to rename that file in the
> repository (perhaps even via dump/transform/load, so it's fixed in

No problem.  It's a test repository; we can just drop and recreate it. 
It's helped to give me some idea of how svn might oddball file names 
imported from outside vendors' packages, but there's no history in the 
repository that's critical.

> history, if that's feasible for your team).  Future versions of
> Subversion won't allow that path to be in the repository in the first
> place.

That's a big help.

Two quick questions:
- Is it worth a small patch to xml.c to include the contents of the 
offending buffer in a "Malformed XML" error message?  It could yield a 
long/multiline message, but would help identify the offending text in a 
large operation.
- Is it worth adding a bit of text about path name requirements to the 
docs?  Would it go in the Book (e.g. the UTF/Path requirements bit of the 
developer info, or even a brief blurb about character sets in Ch 1), or 
elsewhere?

--
Regards,
Charles Bailey  < bailey _at_ newman _dot_ upenn _dot_ edu >
Newman Center at the University of Pennsylvania

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: "Malformed XML" (l10n?) problems on checkout

Posted by Charles Bailey <ba...@newman.upenn.edu>.
--On February 16, 2005 3:22:33 PM -0600 kfogel@collab.net wrote:

> "Dale Worley" <dw...@pingtel.com> writes:
>>
>> But you've identified the Subversion problem correctly -- a file name can
>> contain "non-printable" characters, which are forbidden in XML.  Worse,
>> what you might think is a valid escape sequence to represent it -- &8;
>> -- is also forbidden in XML, because "character entities" are forbidden
>> from representing non-printable characters.  See the discussion in
>>
>> http://www.w3c.org/TR/2004/REC-xml-20040204/#dt-charref
>>
>> Subversion may need to extend XML to allow this (and has to verify that
>> its XML parser can deal with it).
>
> Thanks for the analysis and the reference, Dale.  You're right, and
> Subversion's way of dealing with this is that (now) it no longer
> permits such paths in the repository.  See issue #1954, and see r12581
> and r12632.  There is a long thread on the topic, linked to from the
> issue I believe, that explains the reasoning behind the decision.

Thanks; that's quite helpful.  I'm sorry to have missed it in the archive. 
I was too focused on the specific error, and should have searched for any 
XML-related issue.

> So Charles, your solution right now is to rename that file in the
> repository (perhaps even via dump/transform/load, so it's fixed in

No problem.  It's a test repository; we can just drop and recreate it. 
It's helped to give me some idea of how svn might oddball file names 
imported from outside vendors' packages, but there's no history in the 
repository that's critical.

> history, if that's feasible for your team).  Future versions of
> Subversion won't allow that path to be in the repository in the first
> place.

That's a big help.

Two quick questions:
- Is it worth a small patch to xml.c to include the contents of the 
offending buffer in a "Malformed XML" error message?  It could yield a 
long/multiline message, but would help identify the offending text in a 
large operation.
- Is it worth adding a bit of text about path name requirements to the 
docs?  Would it go in the Book (e.g. the UTF/Path requirements bit of the 
developer info, or even a brief blurb about character sets in Ch 1), or 
elsewhere?

--
Regards,
Charles Bailey  < bailey _at_ newman _dot_ upenn _dot_ edu >
Newman Center at the University of Pennsylvania

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: "Malformed XML" (l10n?) problems on checkout

Posted by kf...@collab.net.
"Dale Worley" <dw...@pingtel.com> writes:
> > From: Charles Bailey [mailto:bailey@newman.upenn.edu]
> 
> > Sure.  I think I've got it.  By process of elimination, the
> > offending file
> > seems to be one named '.ooo^H^H^Htestff^Hile' (where ^H is
> > the usual \x08
> > backspace character).  (No, I've no idea why the creator of
> > this file --
> > likely OpenOffice.org -- chose to use this name.)
> 
> *Why* that file name exists is clear -- someone was trying to type a file
> name into a box.  He typed ".ooo", and then decided he didn't like that, so
> he typed ^H three times, which moved the cursor back three spaces, then he
> typed "test", which wrote over the offending "ooo", but didn't actually
> remove them from the program's input buffer.  Similarly, the final ^H was to
> correct the second "f" so he could replace it with "i".  The name he thought
> he was getting was ".testfile".
> 
> But you've identified the Subversion problem correctly -- a file name can
> contain "non-printable" characters, which are forbidden in XML.  Worse, what
> you might think is a valid escape sequence to represent it -- &8; -- is also
> forbidden in XML, because "character entities" are forbidden from
> representing non-printable characters.  See the discussion in
> 
> http://www.w3c.org/TR/2004/REC-xml-20040204/#dt-charref
> 
> Subversion may need to extend XML to allow this (and has to verify that its
> XML parser can deal with it).

Thanks for the analysis and the reference, Dale.  You're right, and
Subversion's way of dealing with this is that (now) it no longer
permits such paths in the repository.  See issue #1954, and see r12581
and r12632.  There is a long thread on the topic, linked to from the
issue I believe, that explains the reasoning behind the decision.

So Charles, your solution right now is to rename that file in the
repository (perhaps even via dump/transform/load, so it's fixed in
history, if that's feasible for your team).  Future versions of
Subversion won't allow that path to be in the repository in the first
place.

Hope this helps,
-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: "Malformed XML" (l10n?) problems on checkout

Posted by Charles Bailey <ba...@newman.upenn.edu>.
--On February 16, 2005 3:30:50 PM -0500 Dale Worley <dw...@pingtel.com> 
wrote:

>> From: Charles Bailey [mailto:bailey@newman.upenn.edu]
>
>> Sure.  I think I've got it.  By process of elimination, the
>> offending file
>> seems to be one named '.ooo^H^H^Htestff^Hile' (where ^H is
>> the usual \x08
>> backspace character).  (No, I've no idea why the creator of
>> this file --
>> likely OpenOffice.org -- chose to use this name.)
>
> *Why* that file name exists is clear -- someone was trying to type a file
> name into a box.  He typed ".ooo", and then decided he didn't like that,

That'd've been my first instinct as well.  The location and name strike me 
as quite unlikely for a manual save, though, so I think it might be a cute 
name used by OpenOffice sometime in days past (just based on the '.ooo' 
prefix).

> But you've identified the Subversion problem correctly -- a file name can
> contain "non-printable" characters, which are forbidden in XML.  Worse,
> what you might think is a valid escape sequence to represent it -- &8; --
> is also forbidden in XML, because "character entities" are forbidden from
> representing non-printable characters.  See the discussion in
>
> http://www.w3c.org/TR/2004/REC-xml-20040204/#dt-charref
>
> Subversion may need to extend XML to allow this (and has to verify that
> its XML parser can deal with it).

Interestingly, the W3C's 1.1 recommendation and i18n FAQ seems to indicate 
that numeric character references will be legal for control codes other 
than NUL in XML 1.1 
(<http://www.w3.org/International/questions/qa-controls>).  That may mean 
XML parser support isn't that unlikely.  NUL is still a problem, but XML is 
the least of the areas that'd bite most programs.

The suggestion of URI-encoding filenames sounds nice, though it wouldn't be 
backwards-compatible.  If the client and server exchange version info, 
perhaps it could be a runtime selection.

--
Regards,
Charles Bailey  < bailey _at_ newman _dot_ upenn _dot_ edu >
Newman Center at the University of Pennsylvania

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: "Malformed XML" (l10n?) problems on checkout

Posted by kf...@collab.net.
"Dale Worley" <dw...@pingtel.com> writes:
> > From: Charles Bailey [mailto:bailey@newman.upenn.edu]
> 
> > Sure.  I think I've got it.  By process of elimination, the
> > offending file
> > seems to be one named '.ooo^H^H^Htestff^Hile' (where ^H is
> > the usual \x08
> > backspace character).  (No, I've no idea why the creator of
> > this file --
> > likely OpenOffice.org -- chose to use this name.)
> 
> *Why* that file name exists is clear -- someone was trying to type a file
> name into a box.  He typed ".ooo", and then decided he didn't like that, so
> he typed ^H three times, which moved the cursor back three spaces, then he
> typed "test", which wrote over the offending "ooo", but didn't actually
> remove them from the program's input buffer.  Similarly, the final ^H was to
> correct the second "f" so he could replace it with "i".  The name he thought
> he was getting was ".testfile".
> 
> But you've identified the Subversion problem correctly -- a file name can
> contain "non-printable" characters, which are forbidden in XML.  Worse, what
> you might think is a valid escape sequence to represent it -- &8; -- is also
> forbidden in XML, because "character entities" are forbidden from
> representing non-printable characters.  See the discussion in
> 
> http://www.w3c.org/TR/2004/REC-xml-20040204/#dt-charref
> 
> Subversion may need to extend XML to allow this (and has to verify that its
> XML parser can deal with it).

Thanks for the analysis and the reference, Dale.  You're right, and
Subversion's way of dealing with this is that (now) it no longer
permits such paths in the repository.  See issue #1954, and see r12581
and r12632.  There is a long thread on the topic, linked to from the
issue I believe, that explains the reasoning behind the decision.

So Charles, your solution right now is to rename that file in the
repository (perhaps even via dump/transform/load, so it's fixed in
history, if that's feasible for your team).  Future versions of
Subversion won't allow that path to be in the repository in the first
place.

Hope this helps,
-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: "Malformed XML" (l10n?) problems on checkout

Posted by André Malo <nd...@perlig.de>.
* Dale Worley wrote:

> > From: Charles Bailey [mailto:bailey@newman.upenn.edu]
> >
> > Sure.  I think I've got it.  By process of elimination, the
> > offending file
> > seems to be one named '.ooo^H^H^Htestff^Hile' (where ^H is
> > the usual \x08
> > backspace character).  (No, I've no idea why the creator of
> > this file --
> > likely OpenOffice.org -- chose to use this name.)
>
> *Why* that file name exists is clear -- someone was trying to type a file
> name into a box.  He typed ".ooo", and then decided he didn't like that,
> so he typed ^H three times, which moved the cursor back three spaces,
> then he typed "test", which wrote over the offending "ooo", but didn't
> actually remove them from the program's input buffer.  Similarly, the
> final ^H was to correct the second "f" so he could replace it with "i". 
> The name he thought he was getting was ".testfile".
>
> But you've identified the Subversion problem correctly -- a file name can
> contain "non-printable" characters, which are forbidden in XML.  Worse,
> what you might think is a valid escape sequence to represent it -- &8; --
> is also forbidden in XML, because "character entities" are forbidden from
> representing non-printable characters.  See the discussion in
>
> http://www.w3c.org/TR/2004/REC-xml-20040204/#dt-charref
>
> Subversion may need to extend XML to allow this (and has to verify that
> its XML parser can deal with it).

XML 1.1 allows them (except &#0;). No need to do the work of the W3C 
here ;-)
<http://www.w3c.org/TR/2004/REC-xml11-20040204/#sec-xml11>

nd
-- 
"Das Verhalten von Gates hatte mir bewiesen, dass ich auf ihn und seine
beiden Gefährten nicht zu zählen brauchte" -- Karl May, "Winnetou III"

Im Westen was neues: <http://pub.perlig.de/books.html#apache2>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: "Malformed XML" (l10n?) problems on checkout

Posted by Julian Reschke <ju...@gmx.de>.
Dale Worley wrote:
>>From: Charles Bailey [mailto:bailey@newman.upenn.edu]
> 
> 
>>Sure.  I think I've got it.  By process of elimination, the
>>offending file
>>seems to be one named '.ooo^H^H^Htestff^Hile' (where ^H is
>>the usual \x08
>>backspace character).  (No, I've no idea why the creator of
>>this file --
>>likely OpenOffice.org -- chose to use this name.)
> 
> 
> *Why* that file name exists is clear -- someone was trying to type a file
> name into a box.  He typed ".ooo", and then decided he didn't like that, so
> he typed ^H three times, which moved the cursor back three spaces, then he
> typed "test", which wrote over the offending "ooo", but didn't actually
> remove them from the program's input buffer.  Similarly, the final ^H was to
> correct the second "f" so he could replace it with "i".  The name he thought
> he was getting was ".testfile".
> 
> But you've identified the Subversion problem correctly -- a file name can
> contain "non-printable" characters, which are forbidden in XML.  Worse, what
> you might think is a valid escape sequence to represent it -- &8; -- is also
> forbidden in XML, because "character entities" are forbidden from
> representing non-printable characters.  See the discussion in
> 
> http://www.w3c.org/TR/2004/REC-xml-20040204/#dt-charref
> 
> Subversion may need to extend XML to allow this (and has to verify that its
> XML parser can deal with it).

No. Please. Do not "extend" XML. Extending it means breaking existing 
clients that use conforming XML 1.0 parsers.

The proper way (WebDAV does that, so why not Subversion in other 
places?) is to use URIs rather than names (they will never contain 
control characters).

Best regards, Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: "Malformed XML" (l10n?) problems on checkout

Posted by Charles Bailey <ba...@newman.upenn.edu>.
--On February 16, 2005 3:30:50 PM -0500 Dale Worley <dw...@pingtel.com> 
wrote:

>> From: Charles Bailey [mailto:bailey@newman.upenn.edu]
>
>> Sure.  I think I've got it.  By process of elimination, the
>> offending file
>> seems to be one named '.ooo^H^H^Htestff^Hile' (where ^H is
>> the usual \x08
>> backspace character).  (No, I've no idea why the creator of
>> this file --
>> likely OpenOffice.org -- chose to use this name.)
>
> *Why* that file name exists is clear -- someone was trying to type a file
> name into a box.  He typed ".ooo", and then decided he didn't like that,

That'd've been my first instinct as well.  The location and name strike me 
as quite unlikely for a manual save, though, so I think it might be a cute 
name used by OpenOffice sometime in days past (just based on the '.ooo' 
prefix).

> But you've identified the Subversion problem correctly -- a file name can
> contain "non-printable" characters, which are forbidden in XML.  Worse,
> what you might think is a valid escape sequence to represent it -- &8; --
> is also forbidden in XML, because "character entities" are forbidden from
> representing non-printable characters.  See the discussion in
>
> http://www.w3c.org/TR/2004/REC-xml-20040204/#dt-charref
>
> Subversion may need to extend XML to allow this (and has to verify that
> its XML parser can deal with it).

Interestingly, the W3C's 1.1 recommendation and i18n FAQ seems to indicate 
that numeric character references will be legal for control codes other 
than NUL in XML 1.1 
(<http://www.w3.org/International/questions/qa-controls>).  That may mean 
XML parser support isn't that unlikely.  NUL is still a problem, but XML is 
the least of the areas that'd bite most programs.

The suggestion of URI-encoding filenames sounds nice, though it wouldn't be 
backwards-compatible.  If the client and server exchange version info, 
perhaps it could be a runtime selection.

--
Regards,
Charles Bailey  < bailey _at_ newman _dot_ upenn _dot_ edu >
Newman Center at the University of Pennsylvania

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: "Malformed XML" (l10n?) problems on checkout

Posted by Dale Worley <dw...@pingtel.com>.
> From: Charles Bailey [mailto:bailey@newman.upenn.edu]

> Sure.  I think I've got it.  By process of elimination, the
> offending file
> seems to be one named '.ooo^H^H^Htestff^Hile' (where ^H is
> the usual \x08
> backspace character).  (No, I've no idea why the creator of
> this file --
> likely OpenOffice.org -- chose to use this name.)

*Why* that file name exists is clear -- someone was trying to type a file
name into a box.  He typed ".ooo", and then decided he didn't like that, so
he typed ^H three times, which moved the cursor back three spaces, then he
typed "test", which wrote over the offending "ooo", but didn't actually
remove them from the program's input buffer.  Similarly, the final ^H was to
correct the second "f" so he could replace it with "i".  The name he thought
he was getting was ".testfile".

But you've identified the Subversion problem correctly -- a file name can
contain "non-printable" characters, which are forbidden in XML.  Worse, what
you might think is a valid escape sequence to represent it -- &8; -- is also
forbidden in XML, because "character entities" are forbidden from
representing non-printable characters.  See the discussion in

http://www.w3c.org/TR/2004/REC-xml-20040204/#dt-charref

Subversion may need to extend XML to allow this (and has to verify that its
XML parser can deal with it).

Dale


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: "Malformed XML" (l10n?) problems on checkout

Posted by Dale Worley <dw...@pingtel.com>.
> From: Charles Bailey [mailto:bailey@newman.upenn.edu]

> Sure.  I think I've got it.  By process of elimination, the
> offending file
> seems to be one named '.ooo^H^H^Htestff^Hile' (where ^H is
> the usual \x08
> backspace character).  (No, I've no idea why the creator of
> this file --
> likely OpenOffice.org -- chose to use this name.)

*Why* that file name exists is clear -- someone was trying to type a file
name into a box.  He typed ".ooo", and then decided he didn't like that, so
he typed ^H three times, which moved the cursor back three spaces, then he
typed "test", which wrote over the offending "ooo", but didn't actually
remove them from the program's input buffer.  Similarly, the final ^H was to
correct the second "f" so he could replace it with "i".  The name he thought
he was getting was ".testfile".

But you've identified the Subversion problem correctly -- a file name can
contain "non-printable" characters, which are forbidden in XML.  Worse, what
you might think is a valid escape sequence to represent it -- &8; -- is also
forbidden in XML, because "character entities" are forbidden from
representing non-printable characters.  See the discussion in

http://www.w3c.org/TR/2004/REC-xml-20040204/#dt-charref

Subversion may need to extend XML to allow this (and has to verify that its
XML parser can deal with it).

Dale


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: "Malformed XML" (l10n?) problems on checkout

Posted by Charles Bailey <ba...@newman.upenn.edu>.
--On February 16, 2005 12:27:52 PM -0600 kfogel@collab.net wrote:
>
> This sounds like a problem in a .svn/entries or .svn/log file
> somewhere in your working copy, perhaps due to a bug whereby
> Subversion may be failing to properly escape some XML-sensitive
> filenames or something.
>
> If you can narrow this down to a data set that still reproduces the
> bug, but that you'd be willing to share, that'd be terrific.  Failing

Sure.  I think I've got it.  By process of elimination, the offending file 
seems to be one named '.ooo^H^H^Htestff^Hile' (where ^H is the usual \x08 
backspace character).  (No, I've no idea why the creator of this file -- 
likely OpenOffice.org -- chose to use this name.)  Removing or renaming the 
file led to a successful checkout.  In the .svn/log of a checkout which 
failed, the ^Hs persist in the attribute values containing this file's 
name, yielding an illegal "token".

I haven't had a chance to look at the svn source, but I wonder whether some 
code involved in the checkout process isn't escaping these "low end" chars 
before building an XML stream.

--
Regards,
Charles Bailey  < bailey _at_ newman _dot_ upenn _dot_ edu >
Newman Center at the University of Pennsylvania

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: "Malformed XML" (l10n?) problems on checkout

Posted by Charles Bailey <ba...@newman.upenn.edu>.
--On February 16, 2005 12:27:52 PM -0600 kfogel@collab.net wrote:
>
> This sounds like a problem in a .svn/entries or .svn/log file
> somewhere in your working copy, perhaps due to a bug whereby
> Subversion may be failing to properly escape some XML-sensitive
> filenames or something.
>
> If you can narrow this down to a data set that still reproduces the
> bug, but that you'd be willing to share, that'd be terrific.  Failing

Sure.  I think I've got it.  By process of elimination, the offending file 
seems to be one named '.ooo^H^H^Htestff^Hile' (where ^H is the usual \x08 
backspace character).  (No, I've no idea why the creator of this file -- 
likely OpenOffice.org -- chose to use this name.)  Removing or renaming the 
file led to a successful checkout.  In the .svn/log of a checkout which 
failed, the ^Hs persist in the attribute values containing this file's 
name, yielding an illegal "token".

I haven't had a chance to look at the svn source, but I wonder whether some 
code involved in the checkout process isn't escaping these "low end" chars 
before building an XML stream.

--
Regards,
Charles Bailey  < bailey _at_ newman _dot_ upenn _dot_ edu >
Newman Center at the University of Pennsylvania

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org