You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Jeremias Maerki <de...@greenmail.ch> on 2002/11/04 17:02:07 UTC

Line ending chaos in our codebase

Hi there

Before I'm going to work on the multi-threading issues I wanted to clear
up a few patch submissions. While applying them I ran across several
files that had CRCRLF endings instead of CRLF when checked out using
WinCVS on a Windows box. I think I have successfully corrected those I
ran into. Does anyone have a good idea how to...
1. identify files not having correct linevindingstlithoutckaving de opendeach an>every file?
2. enforce correct line endings?

This is probably an old story, but anyway...

Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Line ending chaos in our codebase

Posted by Jeremias Maerki <de...@greenmail.ch>.
Thanks, Peter and Betrand, for your answers.

On Tue, 5 Nov 2002 07:21:54 +0100 Bertrand Delacretaz wrote:
> On Monday 04 November 2002 17:02, Jeremias Maerki wrote:
> >. . .Does anyone have a good idea how to...
> > 2. enforce correct line endings?
> 
> Using the commitinfo administrative file, scripts can be configured in CVS to 
> run when a file is committed, at which point you could detect the problem.

I've heard suggestions like this being discussed in other projects. But
they were never installed. So, I guess it's really up to self-control.

> I'm not sure if it's worth the effort though. When such a problem is found, 
> you could also study file revisions to find out who created the problem and 
> tell people to fix their environment.

The person in question seems to have discovered the issue himself and
fixed it. I'm into some work on the maintenance branch anyway, so I'll
fix the files when I run into them.

Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Line ending chaos in our codebase

Posted by Bertrand Delacretaz <bd...@codeconsult.ch>.
On Monday 04 November 2002 17:02, Jeremias Maerki wrote:
>. . .Does anyone have a good idea how to...
> 2. enforce correct line endings?

Using the commitinfo administrative file, scripts can be configured in CVS to 
run when a file is committed, at which point you could detect the problem. 

I'm not sure if it's worth the effort though. When such a problem is found, 
you could also study file revisions to find out who created the problem and 
tell people to fix their environment.

-Bertrand

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Line ending chaos in our codebase

Posted by "Peter B. West" <pb...@powerup.com.au>.
Bertrand Delacretaz wrote:
> 
> AFAIK as long as the "binary file" flag is not set, CVS takes care of line 
> endings by itself when a file is checked out 
> (http://www.loria.fr/~molli/cvs/doc/cvs_9.html#SEC76), converting them to 
> what's appropriate for the platform.


Bertrand,

This needs a bit more investigation.  As I have said, I have recently 
seen spurious CRLF endings in checked-out files.

Peter
-- 
Peter B. West  pbwest@powerup.com.au  http://www.powerup.com.au/~pbwest/
"Lord, to whom shall we go?"


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Line ending chaos in our codebase

Posted by Bertrand Delacretaz <bd...@codeconsult.ch>.
On Monday 04 November 2002 23:53, Peter B. West wrote:
>. . .I don't know the
> mechanism for handling line-end differences on entry into a CVS
> repository on a unix box.  
>. . .

AFAIK as long as the "binary file" flag is not set, CVS takes care of line 
endings by itself when a file is checked out 
(http://www.loria.fr/~molli/cvs/doc/cvs_9.html#SEC76), converting them to 
what's appropriate for the platform.

Funny things can happen if people checkout files on a unix box and edit them 
from a windows box, but most windows editors handle this correctly.

-Bertrand

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Line ending chaos in our codebase

Posted by "Peter B. West" <pb...@powerup.com.au>.
Jeremias,

Never having had the misfortune to work with Windows, I don't know the 
mechanism for handling line-end differences on entry into a CVS 
repository on a unix box.  I assume you have to handle this on your end, 
because I occasionally see files with CRLF out of CVS repositories.  Is 
this the burden of your question 2?

To clean up your files, you might try some perl.

perl -pi.bak -e 'BEGIN{undef $/};s/\r*\r\n/\n/g' file...

should work from a unix command line.  How you achieve this in Windows I 
don't know.  Once it's working, you can just use `-pi' instead of 
`-pi.bak'.  The 'i' works on files in-place; any trailing characters are 
appended to the file name to create a backup file.  Handy for testing. 
Obviously, the line above gives me LF-only line endings.

Peter

Jeremias Maerki wrote:
> Hi there
> 
> Before I'm going to work on the multi-threading issues I wanted to clear
> up a few patch submissions. While applying them I ran across several
> files that had CRCRLF endings instead of CRLF when checked out using
> WinCVS on a Windows box. I think I have successfully corrected those I
> ran into. Does anyone have a good idea how to...
> 1. identify files not having correct linevindingstlithoutckaving de opendeach an>every file?
> 2. enforce correct line endings?


-- 
Peter B. West  pbwest@powerup.com.au  http://www.powerup.com.au/~pbwest/
"Lord, to whom shall we go?"


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Line ending chaos in our codebase

Posted by Jeremias Maerki <de...@greenmail.ch>.
Hi Peter

Yes, file works fine to identify the bad files. It's available on cygwin.
But I think, I'll give my Java-stuff a chance anyway.

On Wed, 06 Nov 2002 12:55:05 +1000 Peter B. West wrote:
> The unix 'file' command, with a subsequent check for the work 'text', as 
> mentioned by Victor earlier, is a very good start for file type checking.

Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Line ending chaos in our codebase

Posted by "Peter B. West" <pb...@powerup.com.au>.
Jeremias,

The unix 'file' command, with a subsequent check for the work 'text', as 
mentioned by Victor earlier, is a very good start for file type checking.

Peter

Jeremias Maerki wrote:
> Thanks for your input. Your suggestion below smells dangerous, though.
> In the meantime I've started a little check-program (in Java) that
> analyzes files on their line endings using regex-matching. I think, I'll
> expand that to a little project including command-line app, ant-task etc.

-- 
Peter B. West  pbwest@powerup.com.au  http://www.powerup.com.au/~pbwest/
"Lord, to whom shall we go?"


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Line ending chaos in our codebase

Posted by Jeremias Maerki <de...@greenmail.ch>.
Thanks for your input. Your suggestion below smells dangerous, though.
In the meantime I've started a little check-program (in Java) that
analyzes files on their line endings using regex-matching. I think, I'll
expand that to a little project including command-line app, ant-task etc.

On Tue, 5 Nov 2002 08:08:54 -0700 Victor Mote wrote:
> Jeremias Maerki wrote:
> 
> > up a few patch submissions. While applying them I ran across several
> > files that had CRCRLF endings instead of CRLF when checked out using
> > WinCVS on a Windows box. I think I have successfully corrected those I
> > ran into. Does anyone have a good idea how to...
> > 1. identify files not having correct linevindingstlithoutckaving
> > de opendeach an>every file?
> 
> I have never been able to get grep to detect them. The only way I know (and
> it falls into the category of "beat it to death") is to convert each file
> using tr, then compare it to the old one. Here is a script that I just ran
> on my box that works:
> 
> cd /u/vic/fop-trunk
> for I in `find . -type f`
> do
>   cat $I | tr -d "\015" > /u/tmp/QQtest
>   DELTA=`diff $I /u/tmp/QQtest | wc -l`
>   if [ $DELTA -gt 0 ]
>   then
>     echo "$I has DOS line-endings"
>   fi
> done
> rm /u/tmp/QQtest
> 
> It will include binary files in its output as well. If that is a problem,
> add a test to exclude those from consideration (probably using the "file"
> command and looking for the word "text").
> 
> Since I have a hybrid Linux/Windows environment here, I feel like the
> apostles at the Last Supper ("Lord, is it I?").
>
> Also, if you want to clean up the files in the repository, I understand that
> running "cvs admin -kkv FILE" will do so. This will tell cvs to treat the
> files as text files instead of binary, which is apparently the root of the
> problem. (I know, -k is for keywords, but cvs has keywords conversions &
> line-ending conversions in the same space). Make sure you're backed up & do
> some testing to make sure you got what you want.


Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Line ending chaos in our codebase

Posted by "Peter B. West" <pb...@powerup.com.au>.
Victor Mote wrote:
> 
> Also, if you want to clean up the files in the repository, I understand that
> running "cvs admin -kkv FILE" will do so. This will tell cvs to treat the
> files as text files instead of binary, which is apparently the root of the
> problem. (I know, -k is for keywords, but cvs has keywords conversions &
> line-ending conversions in the same space). Make sure you're backed up & do
> some testing to make sure you got what you want.

AFAIK, CVS (how's that for the start of a sentence?) treats files as 
text unless '-kb' is in operation. '-kb' is '-ko' (leave keywords as 
they were in the original checkin of the file) plus binary file I/O. 
I've noticed in this repository that CVS seems to get '-kb' right on, 
e.g., PNG files that I have added and on which I have forgotten to 
specify '-kb'.  I suspect that the guardians of the Apache repository 
have done some work here.

I just had a look.  /home/cvs/CVSROOT/cvswrappers contains the following:

*.gif -k 'b'
*.psd -k 'b'
*.jpg -k 'b'
*.jpeg -k 'b'
*.png -k 'b'
*.psd -k 'b'
*.eps -k 'b'
*.ai -k 'b'
*.jar -k 'b'
*.war -k 'b'
*.class -k 'b'
*.zip -k 'b'
*.ser -k 'b'
*.pdf -k 'b'
*.ico -k 'b'
*.ucs2 -k 'b'
*.ucs4 -k 'b'


'-kkv' is the default keyword expansion form, and is contra-indicated on 
binary files, while '-kb' is contra-indicated on text files, on which 
you definitely want expansion.

These values, incidentally, come from RCS, and can be read about with 
'man co'.

Unless something has changed recently, CRLF will happily go into the 
repository.  A couple of months back (so it seems) I had occasion to 
strip CRs out of a file which had been committed late one night from a 
Windows system.

Peter
-- 
Peter B. West  pbwest@powerup.com.au  http://www.powerup.com.au/~pbwest/
"Lord, to whom shall we go?"


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Line ending chaos in our codebase

Posted by Jeremias Maerki <de...@greenmail.ch>.
It works on cygwin. Here's some sample output from my Windows machine:

$ file /cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/*
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/AWTStarter.java:          ASCII C program text, with CRLF line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/CVS:                      directory
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/CommandLineOptions.java:  ASCII C program text, with CRLF, CR line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/CommandLineStarter.java:  ASCII C program text, with CRLF line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/Driver.java:              ASCII C program text, with CRLF, CR line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/ErrorHandler.java:        ASCII C program text, with CRLF line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/FOInputHandler.java:      ASCII C program text, with CRLF line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/FOPException.java:        ASCII C program text, with CRLF line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/Fop.java:                 ASCII C program text, with CRLF line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/FormattingResults.java:   ASCII C program text, with CRLF line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/InputHandler.java:        ASCII C program text, with CRLF line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/Options.java:             ASCII C program text, with CRLF line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/PageSequenceResults.java: ASCII C program text, with CRLF line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/PrintStarter.java:        ASCII C program text, with CRLF, CR line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/Starter.java:             ASCII C program text, with CRLF line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/StreamRenderer.java:      ASCII C program text, with CRLF, CR line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/TraxInputHandler.java:    ASCII C program text, with CRLF line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/Version.java:             ASCII C program text, with CRLF line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/XSLTInputHandler.java:    ASCII C program text, with CRLF line terminators
/cygdrive/d/FOP/branch/xml-fop/src/org/apache/fop/apps/package.html:             HTML document text

On Wed, 6 Nov 2002 00:09:10 -0700 Victor Mote wrote:
> Tim Landscheidt wrote:
> 
> > Why not just use file(1)?
> >
> > | [tim@butler ~]$ file /var/tmp/test-{dos,unix}.txt
> > | /var/tmp/test-dos.txt:  ASCII text, with CRLF line terminators
> > | /var/tmp/test-unix.txt: ASCII text
> 
> I thought of that too, but it doesn't work on my Linux box (which reports
> both as "ASCII text"), so it is at least somewhat implementation-dependent.
> It seems like our old SCO system could tell the difference, and my Linux is
> not the latest/greatest so perhaps our CVS server can handle it better. I
> don't trust "file" for anything critical, but, if it makes the distinction
> that you noted above, it is probably adequate for the task at hand.


Jeremias Maerki <de...@greenmail.ch>


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


RE: Line ending chaos in our codebase

Posted by Victor Mote <vi...@outfitr.com>.
Tim Landscheidt wrote:

> Why not just use file(1)?
>
> | [tim@butler ~]$ file /var/tmp/test-{dos,unix}.txt
> | /var/tmp/test-dos.txt:  ASCII text, with CRLF line terminators
> | /var/tmp/test-unix.txt: ASCII text

I thought of that too, but it doesn't work on my Linux box (which reports
both as "ASCII text"), so it is at least somewhat implementation-dependent.
It seems like our old SCO system could tell the difference, and my Linux is
not the latest/greatest so perhaps our CVS server can handle it better. I
don't trust "file" for anything critical, but, if it makes the distinction
that you noted above, it is probably adequate for the task at hand.

Victor Mote


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


Re: Line ending chaos in our codebase

Posted by Tim Landscheidt <ti...@tim-landscheidt.de>.
Victor Mote <vi...@outfitr.com> wrote:

> [...]
> I have never been able to get grep to detect them. The only way I know (and
> it falls into the category of "beat it to death") is to convert each file
> using tr, then compare it to the old one. Here is a script that I just ran
> on my box that works:
> [...]

Why not just use file(1)?

| [tim@butler ~]$ file /var/tmp/test-{dos,unix}.txt
| /var/tmp/test-dos.txt:  ASCII text, with CRLF line terminators
| /var/tmp/test-unix.txt: ASCII text

Tim

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org


RE: Line ending chaos in our codebase

Posted by Victor Mote <vi...@outfitr.com>.
Jeremias Maerki wrote:

> up a few patch submissions. While applying them I ran across several
> files that had CRCRLF endings instead of CRLF when checked out using
> WinCVS on a Windows box. I think I have successfully corrected those I
> ran into. Does anyone have a good idea how to...
> 1. identify files not having correct linevindingstlithoutckaving
> de opendeach an>every file?

I have never been able to get grep to detect them. The only way I know (and
it falls into the category of "beat it to death") is to convert each file
using tr, then compare it to the old one. Here is a script that I just ran
on my box that works:

cd /u/vic/fop-trunk
for I in `find . -type f`
do
  cat $I | tr -d "\015" > /u/tmp/QQtest
  DELTA=`diff $I /u/tmp/QQtest | wc -l`
  if [ $DELTA -gt 0 ]
  then
    echo "$I has DOS line-endings"
  fi
done
rm /u/tmp/QQtest

It will include binary files in its output as well. If that is a problem,
add a test to exclude those from consideration (probably using the "file"
command and looking for the word "text").

Since I have a hybrid Linux/Windows environment here, I feel like the
apostles at the Last Supper ("Lord, is it I?").

Also, if you want to clean up the files in the repository, I understand that
running "cvs admin -kkv FILE" will do so. This will tell cvs to treat the
files as text files instead of binary, which is apparently the root of the
problem. (I know, -k is for keywords, but cvs has keywords conversions &
line-ending conversions in the same space). Make sure you're backed up & do
some testing to make sure you got what you want.

Victor Mote



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org