You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "C.A.T.Magic" <c....@gmx.at> on 2004/04/03 00:43:59 UTC

bad output format design issue: svn log -v format is not parseable :-(

Ben Collins-Sussman wrote:

> C.A.T.Magic wrote:
> 
>> when looking at
>> http://svn.collab.net/repos/svn/branches/
>> how could one find ALL branches that were ever started?
> 
> Run 'svn log' on the branches directory.

thanks :)


svn log -v http://svn.collab.net/repos/svn/branches/ >svnbranchlog

(ruby)

lines = File.readlines("svnbranchlog")
lines.each { |line|
     line.gsub!('\\','/')
     if ( line =~ /^   D \/branches\/[^\/]*$/i )
         print "#{line}"
     end
}

=====

the log command (+network transfer) took about 10 seconds
and produces ~1MB of output.
filterering on 'deletes' outputs 86 deleted files.

so far so good...

but now i have a --serious-- problem:

how can I tell, by looking at the output
    A \branches\release-0.28.0 (from \trunk:6846)
if this means the filename
     "\branches\release-0.28.0 (from \trunk:6846)"
or the file
     "\branches\release-0.28.0"
???
this is evil!

svn log -v "X:\Work\release-0.28.0 (from trunk6846)"
------------------------------------------------------------------------
r8 | cat | 2004-04-03 02:40:39 +0200 (Sat, 03 Apr 2004) | 1 line
Changed paths:
    A \trunk\release-0.28.0 (from trunk6846)
------------------------------------------------------------------------

there are several similar issues on other outputs throughout svn.
shouldn't these filenames and usernames be URL-encoded or escaped
everywhere!???

please help + fix.
:(
=======
c.a.t.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by "C. Michael Pilato" <cm...@collab.net>.
Marc Haisenko <ha...@webport.de> writes:

> In the normal command line output I can't think of an alternative as this 
> format also has the advantage of being easily parsed with a regexp, although 
> you can run into a bug if your regexp is not built carefully enough. But in 
> the XML output you could surely make this a tag property pretty
> easily, no ?

Absolutely.  And we do:

$ svn log -r 9212 -v --xml
<?xml version="1.0" encoding="utf-8"?>
<log>
<logentry
   revision="9212">
<author>josander</author>
<date>2004-03-26T09:40:14.875951Z</date>
<paths>
...
<path
   copyfrom-path="/trunk/packages/win32-innosetup/templates/paths_inno_src.iss"
   copyfrom-rev="9211"
   action="A">/trunk/packages/win32-innosetup/templates/svn_dynamics.iss</path>
</paths>
...


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by "C.A.T.Magic" <c....@gmx.at>.

Travis P wrote:
> On Apr 5, 2004, at 11:15 AM, C.A.T.Magic wrote:
>> If you follow the thread I tried XML and ended with a new problem,
>> the "#13\r\n" linefeeds inside of the log messages.
> 
> Ah, okay, had forgotten that.  Those issues look surmountable.
> 
> You can see here what is prescribed for white-space and end-of-line 
> handling:
>   http://www.w3.org/TR/2004/REC-xml-20040204/#sec-white-space

thanks for your help and this link. this cleared up a lot.
i now found out that this was a bug with text/binary (windows)
mode on file open.

> You may find that you have a non-compliant processor.  Compliant processors
> are readily available for C/C++/Java (see my comment below), but I don't 
> know
> about Ruby/Python, etc.

the xml parser supplied with ruby seems to work and produces the
correct output. i've not yet figured out how i get the right xml parser 
callbacks but thats just a matter of time and learing on my part... :)

>> I also think using an XML parser for this task is overkill.
>> I'd get a new deopendency on an XML parser module in my tools,
>> have to find a free or at least LGPL parser that is bugfree and
>> up-to-date etc...

> Well, Subversion uses expat which is included with apr-util. It is 
> included in the source tarball and has a BSD-style license.  For Java 
> and C/C++ apps, I've used Xerces (http://xml.apache.org ; again, a very 
> liberal license ).  You may find that once you have such a reusable 
> software component, it proves its worth in numerous other applications. 
>  But then again, perhaps not for your tasks.

i have to experiment a bit more with existing xml parsers and then 
rewrite some parts of my tools... but I agree that it's not a bad thing
to have an xml parser at hand.


thanks for all the responses on this thread.
c.a.t.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by Travis P <sv...@castle.fastmail.fm>.
On Apr 5, 2004, at 11:15 AM, C.A.T.Magic wrote:

> Marc Haisenko wrote:
>
>> On Monday 05 April 2004 14:54, kfogel@collab.net wrote:
>>> Saying this output is "not parseable" or "evil" is a bit strong :-).
>>>
>>> The output is, in a very rare edge case, ambiguous.  If you create
>>> files whose names end in " (from ...)", you will have a problem, yes.
>>> Sorry for that -- perhaps we should have been better about escaping.
>>> Nevertheless, *do* you actually create such files?  If not, the 
>>> output
>>> is quite easy to parse.
>> But someone might do so sometime, and the universe likes to produce 
>> people that do just these things from which other people think "this 
>> won't happen, or happen rarily" ;-) It's always the same... and a 
>> filename ".*(from.*)" is not all that unlikely.
>> In the normal command line output I can't think of an alternative as 
>> this format also has the advantage of being easily parsed with a 
>> regexp, although you can run into a bug if your regexp is not built 
>> carefully enough. But in the XML output you could surely make this a 
>> tag property pretty easily, no ?
>
> If you follow the thread I tried XML and ended with a new problem,
> the "#13\r\n" linefeeds inside of the log messages.

Ah, okay, had forgotten that.  Those issues look surmountable.

You can see here what is prescribed for white-space and end-of-line 
handling:
   http://www.w3.org/TR/2004/REC-xml-20040204/#sec-white-space

You may find that you have a non-compliant processor.  Compliant 
processors
are readily available for C/C++/Java (see my comment below), but I 
don't know
about Ruby/Python, etc.

> I also think using an XML parser for this task is overkill.
> I'd get a new deopendency on an XML parser module in my tools,
> have to find a free or at least LGPL parser that is bugfree and
> up-to-date etc...

Well, Subversion uses expat which is included with apr-util. It is 
included in the source tarball and has a BSD-style license.  For Java 
and C/C++ apps, I've used Xerces (http://xml.apache.org ; again, a very 
liberal license ).  You may find that once you have such a reusable 
software component, it proves its worth in numerous other applications. 
  But then again, perhaps not for your tasks.

> maybe my simple "TAB" patch will do.

That it may.

-Travis


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by "C.A.T.Magic" <c....@gmx.at>.
Marc Haisenko wrote:

> On Monday 05 April 2004 14:54, kfogel@collab.net wrote:
> 
>>Saying this output is "not parseable" or "evil" is a bit strong :-).
>>
>>The output is, in a very rare edge case, ambiguous.  If you create
>>files whose names end in " (from ...)", you will have a problem, yes.
>>Sorry for that -- perhaps we should have been better about escaping.
>>Nevertheless, *do* you actually create such files?  If not, the output
>>is quite easy to parse.
> 
> 
> But someone might do so sometime, and the universe likes to produce people 
> that do just these things from which other people think "this won't happen, 
> or happen rarily" ;-) It's always the same... and a filename ".*(from.*)" is 
> not all that unlikely.
> 
> In the normal command line output I can't think of an alternative as this 
> format also has the advantage of being easily parsed with a regexp, although 
> you can run into a bug if your regexp is not built carefully enough. But in 
> the XML output you could surely make this a tag property pretty easily, no ?

If you follow the thread I tried XML and ended with a new problem,
the "#13\r\n" linefeeds inside of the log messages.
I also think using an XML parser for this task is overkill.
I'd get a new deopendency on an XML parser module in my tools,
have to find a free or at least LGPL parser that is bugfree and
up-to-date etc...
maybe my simple "TAB" patch will do.

:-)
c.a.t.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by Marc Haisenko <ha...@webport.de>.
On Monday 05 April 2004 14:54, kfogel@collab.net wrote:
> Saying this output is "not parseable" or "evil" is a bit strong :-).
>
> The output is, in a very rare edge case, ambiguous.  If you create
> files whose names end in " (from ...)", you will have a problem, yes.
> Sorry for that -- perhaps we should have been better about escaping.
> Nevertheless, *do* you actually create such files?  If not, the output
> is quite easy to parse.

But someone might do so sometime, and the universe likes to produce people 
that do just these things from which other people think "this won't happen, 
or happen rarily" ;-) It's always the same... and a filename ".*(from.*)" is 
not all that unlikely.

In the normal command line output I can't think of an alternative as this 
format also has the advantage of being easily parsed with a regexp, although 
you can run into a bug if your regexp is not built carefully enough. But in 
the XML output you could surely make this a tag property pretty easily, no ?

-- 
Marc Haisenko
Systemspezialist
Webport IT-Services GmbH
mailto: haisenko@webport.de

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by "C.A.T.Magic" <c....@gmx.at>.
Travis P wrote:

> 
> On Apr 5, 2004, at 11:06 AM, C.A.T.Magic wrote:
> 
>> patch1: svn status -v now truncates long usernames to the specified 12 
>> characters to prevent ambiguity like in
>>   >svn status -v
>>   >                11       11 cat            a file.txt
>>   >                12       12 administrator johnMy spaced File.c
>>
>> -    printf ("%c%c%c%c%c  %c   %6s   %6s %-12s %s\n",
>> +    printf ("%c%c%c%c%c  %c   %6s   %6s %-12.12s %s\n",
> 
> 
> How'd you get 12 above? Wouldn't there be a space between john and My?  
> (There is a space in the format string prior to the final %s.)
> Truncation of user names will naturally create new ambiguities between, 
> e.g.
>   "administrator john" and "administrator sally" who will both appear as 
> "administrato"

because currently i don't care about the name (...) and
most applications work like this (cut off too long names).
cutting off is better because this way you can expect a certain item
to start at a specific column. ofcourse, the TAB solution would work in 
this case too, if you prefer that one.
hmmm ... will the %6 ever overflow? at least the %c are stable.

> Given your other patch, I would think you'd want to just use tab between 
> user name and file name.
> (Not that either of these is necessarily good for Subversion trunk.  Not 
> sure why you aren't exploring the XML output option.)

-) svn status does not have --xml output
-) there is probably a new issue with #13\r\n in the xml output.
-) xml is annoying to parse, good xml parsers are not free, and not easy 
to embed in existing code (callbacks, etc). it also adds a new external 
dependency to my code.

======
c.a.t.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by "C.A.T.Magic" <c....@gmx.at>.
Travis P wrote:

> 
> On Apr 5, 2004, at 11:06 AM, C.A.T.Magic wrote:
> 
>> patch1: svn status -v now truncates long usernames to the specified 12 
>> characters to prevent ambiguity like in
>>   >svn status -v
>>   >                11       11 cat            a file.txt
>>   >                12       12 administrator johnMy spaced File.c
>>
>> -    printf ("%c%c%c%c%c  %c   %6s   %6s %-12s %s\n",
>> +    printf ("%c%c%c%c%c  %c   %6s   %6s %-12.12s %s\n",
> 
> 
> How'd you get 12 above? Wouldn't there be a space between john and My?  
> (There is a space in the format string prior to the final %s.)
> Truncation of user names will naturally create new ambiguities between, 
> e.g.
>   "administrator john" and "administrator sally" who will both appear as 
> "administrato"

because currently i don't care about the name (...) and
most applications work like this (cut off too long names).
cutting off is better because this way you can expect a certain item
to start at a specific column. ofcourse, the TAB solution would work in 
this case too, if you prefer that one.
hmmm ... will the %6 ever overflow? at least the %c are stable.

> Given your other patch, I would think you'd want to just use tab between 
> user name and file name.
> (Not that either of these is necessarily good for Subversion trunk.  Not 
> sure why you aren't exploring the XML output option.)

-) svn status does not have --xml output
-) there is probably a new issue with #13\r\n in the xml output.
-) xml is annoying to parse, good xml parsers are not free, and not easy 
to embed in existing code (callbacks, etc). it also adds a new external 
dependency to my code.

======
c.a.t.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: [PATCH] Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by Travis P <sv...@castle.fastmail.fm>.
On Apr 5, 2004, at 11:06 AM, C.A.T.Magic wrote:

> patch1: svn status -v now truncates long usernames to the specified 12 
> characters to prevent ambiguity like in
>   >svn status -v
>   >                11       11 cat            a file.txt
>   >                12       12 administrator johnMy spaced File.c
>
> -    printf ("%c%c%c%c%c  %c   %6s   %6s %-12s %s\n",
> +    printf ("%c%c%c%c%c  %c   %6s   %6s %-12.12s %s\n",

How'd you get 12 above? Wouldn't there be a space between john and My?  
(There is a space in the format string prior to the final %s.)
Truncation of user names will naturally create new ambiguities between, 
e.g.
   "administrator john" and "administrator sally" who will both appear 
as "administrato"
Given your other patch, I would think you'd want to just use tab 
between user name and file name.
(Not that either of these is necessarily good for Subversion trunk.  
Not sure why you aren't exploring the XML output option.)

-Travis


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: [PATCH] Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by Travis P <sv...@castle.fastmail.fm>.
On Apr 5, 2004, at 11:06 AM, C.A.T.Magic wrote:

> patch1: svn status -v now truncates long usernames to the specified 12 
> characters to prevent ambiguity like in
>   >svn status -v
>   >                11       11 cat            a file.txt
>   >                12       12 administrator johnMy spaced File.c
>
> -    printf ("%c%c%c%c%c  %c   %6s   %6s %-12s %s\n",
> +    printf ("%c%c%c%c%c  %c   %6s   %6s %-12.12s %s\n",

How'd you get 12 above? Wouldn't there be a space between john and My?  
(There is a space in the format string prior to the final %s.)
Truncation of user names will naturally create new ambiguities between, 
e.g.
   "administrator john" and "administrator sally" who will both appear 
as "administrato"
Given your other patch, I would think you'd want to just use tab 
between user name and file name.
(Not that either of these is necessarily good for Subversion trunk.  
Not sure why you aren't exploring the XML output option.)

-Travis


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

[PATCH] Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by "C.A.T.Magic" <c....@gmx.at>.

yes, at least the longer usernames with whitespaces
were a hazzle before i patched the svn cli.
and if you ever have a name like
"patch2.3 (from customer)" in your
repository its at least annoying to find the exact
regexp for the log parser.


i'd like to suggest the following small patches,

but i'm not sure if anybody else (who doesnt care
for spaces) relies on the current /amiguous/ output
format. an optional solution could be to use a
colon instead of the tab, if you don't like tabs
to format your output. i found the TAB least intrusive.


my (suggested) CHANGES:


fixed two output format ambiguities:

patch1: svn status -v now truncates long usernames to the specified 12 
characters to prevent ambiguity like in
   >svn status -v
   >                11       11 cat            a file.txt
   >                12       12 administrator johnMy spaced File.c

patch2: svn log now separates the path and the copyfrom path with a TAB, 
since a tab cannot occour in a filename to prevent ambiguity like in
   >svn log -v
   > A \tag\patch1.2 (from customer) (from \trunk\patch1.2:623)
this now changes to
   > A \tag\patch1.2 (from customer)<TAB>(from \trunk\patch1.2:623)


======
c.a.t.


=============================



[PATCH] Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by "C.A.T.Magic" <c....@gmx.at>.

yes, at least the longer usernames with whitespaces
were a hazzle before i patched the svn cli.
and if you ever have a name like
"patch2.3 (from customer)" in your
repository its at least annoying to find the exact
regexp for the log parser.


i'd like to suggest the following small patches,

but i'm not sure if anybody else (who doesnt care
for spaces) relies on the current /amiguous/ output
format. an optional solution could be to use a
colon instead of the tab, if you don't like tabs
to format your output. i found the TAB least intrusive.


my (suggested) CHANGES:


fixed two output format ambiguities:

patch1: svn status -v now truncates long usernames to the specified 12 
characters to prevent ambiguity like in
   >svn status -v
   >                11       11 cat            a file.txt
   >                12       12 administrator johnMy spaced File.c

patch2: svn log now separates the path and the copyfrom path with a TAB, 
since a tab cannot occour in a filename to prevent ambiguity like in
   >svn log -v
   > A \tag\patch1.2 (from customer) (from \trunk\patch1.2:623)
this now changes to
   > A \tag\patch1.2 (from customer)<TAB>(from \trunk\patch1.2:623)


======
c.a.t.


=============================



Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by kf...@collab.net.
Saying this output is "not parseable" or "evil" is a bit strong :-).

The output is, in a very rare edge case, ambiguous.  If you create
files whose names end in " (from ...)", you will have a problem, yes.
Sorry for that -- perhaps we should have been better about escaping.
Nevertheless, *do* you actually create such files?  If not, the output
is quite easy to parse.

In prioritizing, we try to distinguish between theoretical problems
and actual work-stoppers.

-Karl

"C.A.T.Magic" <c....@gmx.at> writes:
> svn log -v http://svn.collab.net/repos/svn/branches/ >svnbranchlog
> 
> (ruby)
> 
> lines = File.readlines("svnbranchlog")
> lines.each { |line|
>      line.gsub!('\\','/')
>      if ( line =~ /^   D \/branches\/[^\/]*$/i )
>          print "#{line}"
>      end
> }
> 
> =====
> 
> the log command (+network transfer) took about 10 seconds
> and produces ~1MB of output.
> filterering on 'deletes' outputs 86 deleted files.
> 
> so far so good...
> 
> but now i have a --serious-- problem:
> 
> how can I tell, by looking at the output
>     A \branches\release-0.28.0 (from \trunk:6846)
> if this means the filename
>      "\branches\release-0.28.0 (from \trunk:6846)"
> or the file
>      "\branches\release-0.28.0"
> ???
> this is evil!
> 
> svn log -v "X:\Work\release-0.28.0 (from trunk6846)"
> ------------------------------------------------------------------------
> r8 | cat | 2004-04-03 02:40:39 +0200 (Sat, 03 Apr 2004) | 1 line
> Changed paths:
>     A \trunk\release-0.28.0 (from trunk6846)
> ------------------------------------------------------------------------
> 
> there are several similar issues on other outputs throughout svn.
> shouldn't these filenames and usernames be URL-encoded or escaped
> everywhere!???
> 
> please help + fix.
> :(
> =======
> c.a.t.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by Bruce Elrick <br...@elrick.ca>.
Hmm...  Maybe I am wrong. 

C.A.T.Magic writes: 

> Bruce Elrick wrote:
>> The XML standard states that all whitespace *between* tags, including 
>> line feeds, is to be preserved by the parser.  It is up to the 
>> application to decide what to do.  If your xml viewer is collapsing 
>> spaces then it is not following the standard
>> I'm not sure what the standard says about spaces in tags, but I think 
>> that can be changed without changing the semantic meaning of the XML.
>> So
>> <path
>> action=">
>> can be changed to <path action="A"> because the carriage return is *in* 
>> the tag, but
>> <sometag>
>> Contents of the
>> tag
>> </sometag>
>> Is such that the string representing the contents of the tag would be
>> "\n  Contents of the\n  tag\n"
>> The application could then choose to manipulate the string.  The initial 
>> line feed comes as a suprise to some people.
>> Cheers...
>> Bruce
> 
> if this is true, then the svn --xml output is REALLY broken because 
> 
> <msg>* branches/issue-531-dev/subversion: Merged -r2699:2702 changes&#13;
> from /trunk/subversion.&#13;
> &#13;
> * subversion/libsvn_fs/reps-strings.c (APR_ARRAY_IDX):&#13;
> Remove duplicate definition.&#13;
> </msg> 
> 
> would turn into somewhing like "\r\r\n" at least on Windows. 
> 
> ======
> c.a.t. 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org 
> 
 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by "C.A.T.Magic" <c....@gmx.at>.
Bruce Elrick wrote:
> The XML standard states that all whitespace *between* tags, including 
> line feeds, is to be preserved by the parser.  It is up to the 
> application to decide what to do.  If your xml viewer is collapsing 
> spaces then it is not following the standard
> I'm not sure what the standard says about spaces in tags, but I think 
> that can be changed without changing the semantic meaning of the XML.
> So
> <path
> action=">
> can be changed to <path action="A"> because the carriage return is *in* 
> the tag, but
> <sometag>
> Contents of the
> tag
> </sometag>
> Is such that the string representing the contents of the tag would be
> "\n  Contents of the\n  tag\n"
> The application could then choose to manipulate the string.  The initial 
> line feed comes as a suprise to some people.
> Cheers...
> Bruce

if this is true, then the svn --xml output is REALLY broken because

<msg>* branches/issue-531-dev/subversion: Merged -r2699:2702 changes&#13;
from /trunk/subversion.&#13;
&#13;
* subversion/libsvn_fs/reps-strings.c (APR_ARRAY_IDX):&#13;
Remove duplicate definition.&#13;
</msg>

would turn into somewhing like "\r\r\n" at least on Windows.

======
c.a.t.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by Bruce Elrick <br...@elrick.ca>.
Ack.  My webmail interface butchered the whitespace in my example.  Sigh 

 

Bruce Elrick writes: 

> The XML standard states that all whitespace *between* tags, including line 
> feeds, is to be preserved by the parser.  It is up to the application to 
> decide what to do.  If your xml viewer is collapsing spaces then it is not 
> following the standard  
> 
> I'm not sure what the standard says about spaces in tags, but I think that 
> can be changed without changing the semantic meaning of the XML.  
> 
> So
> <path
> action=">  
> 
> can be changed to <path action="A"> because the carriage return is *in* 
> the tag, but  
> 
> <sometag>
> Contents of the
> tag
> </sometag>  
> 
> Is such that the string representing the contents of the tag would be
> "\n  Contents of the\n  tag\n"  
> 
> The application could then choose to manipulate the string.  The initial 
> line feed comes as a suprise to some people.  
> 
> Cheers...
> Bruce  
> 
> C.A.T.Magic writes:  
> 
>> Edmund Horner wrote:  
>> 
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1  
>>> 
>>> C.A.T.Magic wrote:
>>> | how can I tell, by looking at the output
>>> |    A \branches\release-0.28.0 (from \trunk:6846)
>>> | if this means the filename
>>> |     "\branches\release-0.28.0 (from \trunk:6846)"
>>> | or the file
>>> |     "\branches\release-0.28.0"
>>> | ???
>>> | this is evil!  
>>> 
>>> What about "svn log --xml" ?  Hopefully it should output something that
>>> is unambiguously machine readable (though requiring an XML parser, of
>>> course).
>> 
>> as far as i can see it the XML output has similar issues:
>> space characters are not escaped and the filenames are
>> not defined as string attributes but just as plaintext
>> between tags.
>> afaik, most XML parsers collapse multiple spaces into a
>> single space.  
>> 
>> when i look at the xml with an xml-viewer the SVN output
>>   <path
>>      action="A">/trunk/  a  file  with  2  spaces</path>
>>   </paths>
>> displays as
>>   <path action="A">/trunk/ a file with 2 spaces</path>  
>> 
>> but i also tried it with the ruby-xml parser
>> and the parsed string was correct
>>   ( "/trunk/  a  file  with  2  spaces" )  
>> 
>> i'm not sure what the xml standard defines
>> for multiple spaces in this case or how other xml
>> parsers handle this.  
>> 
>> i noticed that other characters are escaped,
>> for example semicolons ';' are handled,
>> umlauts 'ö' remain as utf-8.
>>   <path
>>      action="A">/trunk/a&amp;b-ö;2 3.txt</path>
>>   </paths>  
>> 
>> 
>> but even IF i could use --xml for svn log,
>> there are several other commands
>> which fail in the same way:
>>  svn status -v
>>                 8        8 cat          release-0.28.0 (from trunk6846)
>>                11       11 cat            a  file  with  spaces
>>                10       10 cat          a&b-ö;2 3.txt
>>                12       12 administrator johnMyFile.c
>>                12       12 administrator johnMy spaced File.c
>> ---
>> and these commands cannot output xml.  
>> 
>> ======
>> c.a.t.  
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
>> For additional commands, e-mail: dev-help@subversion.tigris.org  
>> 
>  
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org 
> 
 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by Bruce Elrick <br...@elrick.ca>.
The XML standard states that all whitespace *between* tags, including line 
feeds, is to be preserved by the parser.  It is up to the application to 
decide what to do.  If your xml viewer is collapsing spaces then it is not 
following the standard 

I'm not sure what the standard says about spaces in tags, but I think that 
can be changed without changing the semantic meaning of the XML. 

So
<path
 action="> 

can be changed to <path action="A"> because the carriage return is *in* the 
tag, but 

<sometag>
 Contents of the
 tag
</sometag> 

Is such that the string representing the contents of the tag would be
"\n  Contents of the\n  tag\n" 

The application could then choose to manipulate the string.  The initial 
line feed comes as a suprise to some people. 

Cheers...
Bruce 

C.A.T.Magic writes: 

> Edmund Horner wrote: 
> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1 
>> 
>> C.A.T.Magic wrote:
>> | how can I tell, by looking at the output
>> |    A \branches\release-0.28.0 (from \trunk:6846)
>> | if this means the filename
>> |     "\branches\release-0.28.0 (from \trunk:6846)"
>> | or the file
>> |     "\branches\release-0.28.0"
>> | ???
>> | this is evil! 
>> 
>> What about "svn log --xml" ?  Hopefully it should output something that
>> is unambiguously machine readable (though requiring an XML parser, of
>> course).
> 
> as far as i can see it the XML output has similar issues:
> space characters are not escaped and the filenames are
> not defined as string attributes but just as plaintext
> between tags.
> afaik, most XML parsers collapse multiple spaces into a
> single space. 
> 
> when i look at the xml with an xml-viewer the SVN output
>   <path
>      action="A">/trunk/  a  file  with  2  spaces</path>
>   </paths>
> displays as
>   <path action="A">/trunk/ a file with 2 spaces</path> 
> 
> but i also tried it with the ruby-xml parser
> and the parsed string was correct
>   ( "/trunk/  a  file  with  2  spaces" ) 
> 
> i'm not sure what the xml standard defines
> for multiple spaces in this case or how other xml
> parsers handle this. 
> 
> i noticed that other characters are escaped,
> for example semicolons ';' are handled,
> umlauts 'ö' remain as utf-8.
>   <path
>      action="A">/trunk/a&amp;b-ö;2 3.txt</path>
>   </paths> 
> 
> 
> but even IF i could use --xml for svn log,
> there are several other commands
> which fail in the same way:
>  svn status -v
>                 8        8 cat          release-0.28.0 (from trunk6846)
>                11       11 cat            a  file  with  spaces
>                10       10 cat          a&b-ö;2 3.txt
>                12       12 administrator johnMyFile.c
>                12       12 administrator johnMy spaced File.c
> ---
> and these commands cannot output xml. 
> 
> ======
> c.a.t. 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org 
> 
 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by "C.A.T.Magic" <c....@gmx.at>.
 >>> C.A.T.Magic wrote:
 >>> | how can I tell, by looking at the output
 >>> |    A \branches\release-0.28.0 (from \trunk:6846)
 >>> | if this means the filename
 >>> |     "\branches\release-0.28.0 (from \trunk:6846)"
 >>> | or the file
 >>> |     "\branches\release-0.28.0"
 >>> | ???

 >> Edmund Horner wrote:
 >> What about "svn log --xml" ?  Hopefully it should output something that
 >> is unambiguously machine readable (though requiring an XML parser, of
 >> course).

 > C.A.T.Magic wrote:
 > as far as i can see it the XML output has similar issues:
 > space characters are not escaped and the filenames are
 > not defined as string attributes but just as plaintext
 > between tags.
 > afaik, most XML parsers collapse multiple spaces into a
 > single space.
 >
 > when i look at the xml with an xml-viewer the SVN output
 >   <path
 >      action="A">/trunk/  a  file  with  2  spaces</path>
 >   </paths>
 > displays as
 >   <path action="A">/trunk/ a file with 2 spaces</path>
 >
 > but i also tried it with the ruby-xml parser
 > and the parsed string was correct
 >   ( "/trunk/  a  file  with  2  spaces" )
 >
 > i'm not sure what the xml standard defines
 > for multiple spaces in this case or how other xml
 > parsers handle this.
 >
 > i noticed that other characters are escaped,
 > for example semicolons ';' are handled,
 > umlauts 'ö' remain as utf-8.
 >   <path
 >      action="A">/trunk/a&amp;b-ö;2 3.txt</path>
 >   </paths>
 >
 >
 > but even IF i could use --xml for svn log,
 > there are several other commands
 > which fail in the same way:
 >  svn status -v
 >                 8        8 cat          release-0.28.0 (from trunk6846)
 >                11       11 cat            a  file  with  spaces
 >                10       10 cat          a&b-ö;2 3.txt
 >                12       12 administrator johnMyFile.c
 >                12       12 administrator johnMy spaced File.c
 > ---
 > and these commands cannot output xml.
 >
 > ======
 > c.a.t.

I consider these a P2 issue, because they really prevent me from
creating scripts that rely on the svn output.
how should I file this issue?
two or three issues, one for each svn function, one for xml,
or one issue just including this mail...?


I quick-fixed our svn commandline version to output a TAB after
filenames and after usernames, but I'd like to know if this
is the way to go for the svn team as well, or if I have to
merge that patch in every time I update svn... ?


thanks,
======
c.a.t.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by "C.A.T.Magic" <c....@gmx.at>.
Edmund Horner wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> C.A.T.Magic wrote:
> | how can I tell, by looking at the output
> |    A \branches\release-0.28.0 (from \trunk:6846)
> | if this means the filename
> |     "\branches\release-0.28.0 (from \trunk:6846)"
> | or the file
> |     "\branches\release-0.28.0"
> | ???
> | this is evil!
> 
> What about "svn log --xml" ?  Hopefully it should output something that
> is unambiguously machine readable (though requiring an XML parser, of
> course).

as far as i can see it the XML output has similar issues:
space characters are not escaped and the filenames are
not defined as string attributes but just as plaintext
between tags.
afaik, most XML parsers collapse multiple spaces into a
single space.

when i look at the xml with an xml-viewer the SVN output
   <path
      action="A">/trunk/  a  file  with  2  spaces</path>
   </paths>
displays as
   <path action="A">/trunk/ a file with 2 spaces</path>

but i also tried it with the ruby-xml parser
and the parsed string was correct
   ( "/trunk/  a  file  with  2  spaces" )

i'm not sure what the xml standard defines
for multiple spaces in this case or how other xml
parsers handle this.

i noticed that other characters are escaped,
for example semicolons ';' are handled,
umlauts 'ö' remain as utf-8.
   <path
      action="A">/trunk/a&amp;b-ö;2 3.txt</path>
   </paths>


but even IF i could use --xml for svn log,
there are several other commands
which fail in the same way:
  svn status -v
                 8        8 cat          release-0.28.0 (from trunk6846)
                11       11 cat            a  file  with  spaces
                10       10 cat          a&b-ö;2 3.txt
                12       12 administrator johnMyFile.c
                12       12 administrator johnMy spaced File.c
---
and these commands cannot output xml.

======
c.a.t.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by Edmund Horner <ch...@chrysophylax.cjb.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

C.A.T.Magic wrote:
| how can I tell, by looking at the output
|    A \branches\release-0.28.0 (from \trunk:6846)
| if this means the filename
|     "\branches\release-0.28.0 (from \trunk:6846)"
| or the file
|     "\branches\release-0.28.0"
| ???
| this is evil!

What about "svn log --xml" ?  Hopefully it should output something that
is unambiguously machine readable (though requiring an XML parser, of
course).

Edmund.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAbktpEbvImmpUq7gRAtlbAKDyx3jQiphxORffajw3j3NWoygnsgCgvWa2
rWKVHj5ILS/hWiuDG0RXc1I=
=5hZX
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: bad output format design issue: svn log -v format is not parseable :-(

Posted by kf...@collab.net.
Saying this output is "not parseable" or "evil" is a bit strong :-).

The output is, in a very rare edge case, ambiguous.  If you create
files whose names end in " (from ...)", you will have a problem, yes.
Sorry for that -- perhaps we should have been better about escaping.
Nevertheless, *do* you actually create such files?  If not, the output
is quite easy to parse.

In prioritizing, we try to distinguish between theoretical problems
and actual work-stoppers.

-Karl

"C.A.T.Magic" <c....@gmx.at> writes:
> svn log -v http://svn.collab.net/repos/svn/branches/ >svnbranchlog
> 
> (ruby)
> 
> lines = File.readlines("svnbranchlog")
> lines.each { |line|
>      line.gsub!('\\','/')
>      if ( line =~ /^   D \/branches\/[^\/]*$/i )
>          print "#{line}"
>      end
> }
> 
> =====
> 
> the log command (+network transfer) took about 10 seconds
> and produces ~1MB of output.
> filterering on 'deletes' outputs 86 deleted files.
> 
> so far so good...
> 
> but now i have a --serious-- problem:
> 
> how can I tell, by looking at the output
>     A \branches\release-0.28.0 (from \trunk:6846)
> if this means the filename
>      "\branches\release-0.28.0 (from \trunk:6846)"
> or the file
>      "\branches\release-0.28.0"
> ???
> this is evil!
> 
> svn log -v "X:\Work\release-0.28.0 (from trunk6846)"
> ------------------------------------------------------------------------
> r8 | cat | 2004-04-03 02:40:39 +0200 (Sat, 03 Apr 2004) | 1 line
> Changed paths:
>     A \trunk\release-0.28.0 (from trunk6846)
> ------------------------------------------------------------------------
> 
> there are several similar issues on other outputs throughout svn.
> shouldn't these filenames and usernames be URL-encoded or escaped
> everywhere!???
> 
> please help + fix.
> :(
> =======
> c.a.t.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org