You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by sebb <se...@gmail.com> on 2019/06/17 21:26:45 UTC

[CODEC] CRLF files in macOS checkout

Most of the files in my clone of codec have LF endings, however a few are CRLF:

./README.md
./src/assembly/bin.xml
./src/assembly/src.xml
./src/changes/changes.xml
./src/main/java/org/apache/commons/codec/cli/Digest.java
./src/main/java/org/apache/commons/codec/language/DaitchMokotoffSoundex.java
./src/main/resources/org/apache/commons/codec/language/bm/lang.txt
./src/test/java/org/apache/commons/codec/digest/HmacAlgorithmsTest.java
./src/test/java/org/apache/commons/codec/digest/MessageDigestAlgorithmsTest.java
./src/test/java/org/apache/commons/codec/digest/PureJavaCrc32Test.java
./src/test/java/org/apache/commons/codec/language/ColognePhoneticTest.java


This causes spurious differences when the files are updated.

Can these files be easily fixed without causing huge diffs to be generated?

Also, is there any way to prevent such files being committed to the repo?

S.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [CODEC] CRLF files in macOS checkout

Posted by Alex Herbert <al...@gmail.com>.

> On 18 Jun 2019, at 18:59, sebb <se...@gmail.com> wrote:
> 
> On Tue, 18 Jun 2019 at 16:01, Alex Herbert <alex.d.herbert@gmail.com <ma...@gmail.com>> wrote:
>> 
>> 
>> On 18/06/2019 15:38, sebb wrote:
>>> On Tue, 18 Jun 2019 at 12:58, Alex Herbert <al...@gmail.com> wrote:
>>>> 
>>>> On 18/06/2019 11:00, sebb wrote:
>>>>> On Tue, 18 Jun 2019 at 10:40, Alex Herbert <al...@gmail.com> wrote:
>>>>>> On 18/06/2019 09:55, sebb wrote:
>>>>>>> On Tue, 18 Jun 2019 at 08:15, Julian Reschke <ju...@gmx.de> wrote:
>>>>>>>> On 17.06.2019 23:26, sebb wrote:
>>>>>>>>> Most of the files in my clone of codec have LF endings, however a few are CRLF:
>>>>>>>>> 
>>>>>>>>> ./README.md
>>>>>>>>> ./src/assembly/bin.xml
>>>>>>>>> ./src/assembly/src.xml
>>>>>>>>> ./src/changes/changes.xml
>>>>>>>>> ./src/main/java/org/apache/commons/codec/cli/Digest.java
>>>>>>>>> ./src/main/java/org/apache/commons/codec/language/DaitchMokotoffSoundex.java
>>>>>>>>> ./src/main/resources/org/apache/commons/codec/language/bm/lang.txt
>>>>>>>>> ./src/test/java/org/apache/commons/codec/digest/HmacAlgorithmsTest.java
>>>>>>>>> ./src/test/java/org/apache/commons/codec/digest/MessageDigestAlgorithmsTest.java
>>>>>>>>> ./src/test/java/org/apache/commons/codec/digest/PureJavaCrc32Test.java
>>>>>>>>> ./src/test/java/org/apache/commons/codec/language/ColognePhoneticTest.java
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> This causes spurious differences when the files are updated.
>>>>>>>>> 
>>>>>>>>> Can these files be easily fixed without causing huge diffs to be generated?
>>>>>>>>> 
>>>>>>>>> Also, is there any way to prevent such files being committed to the repo?
>>>>>>>>> 
>>>>>>>>> S.
>>>>>>>> If svn:eol-style is set to "native", it shouldn't matter. I think this
>>>>>>>> can be defaulted for newly added files.
>>>>>>> Thanks, but this is Git, not SVN.
>>>>>>> 
>>>>>>>> In Jackrabbit, I regularly run a script to spot new files missing the
>>>>>>>> property.
>>>>>>> Are you willing to share the script?
>>>>>> This was recently a problem in [statistics]. It was fixed using a
>>>>>> .gitattributes file [1] containing:
>>>>>> 
>>>>>> * text=auto
>>>>>> 
>>>>>> You can fix all the existing files following the steps detailed on the
>>>>>> git documentation:
>>>>>> 
>>>>>> $ echo "* text=auto" >.gitattributes
>>>>>> 
>>>>>> $ git add --renormalize .
>>>>>> 
>>>>>> $ git status        # Show files that will be normalized
>>>>>> 
>>>>>> $ git commit -m "Introduce end-of-line normalization"
>>>>> Thanks, though that did not pick up two of the files.
>>>> Oh dear.
>>>> 
>>>> When I tried this locally it misses from your list:
>>>> 
>>>> ./src/changes/changes.xml
>>>> ./src/test/java/org/apache/commons/codec/language/ColognePhoneticTest.java
>>>> 
>>>> Those files are also ignored on my machine (linux) by dos2unix. They are
>>>> not found by any of the following [1]:
>>>> 
>>>> $ grep -IUr --color "^M" src
>>>> $ find src -type f | xargs file | grep CRLF
>>>> $ grep -IUlr $'\r' src
>>>> 
>>>> So are they a problem?
>>> I don't know if this causes an issue.
>>> 
>>> I used file on macOS to detect the problem files.
>>> Also my editor (BBEdit) shows the EOL as CRLF for them.
> 
> I've since recloned the repo, and those 2 files don't have CRLF endings.
> Something must have been confused in my workspace.
> 
>> I am currently on linux. I don't have any settings for line endings
>> configured for git [1], i.e. the core.autocrlf property. So if I am
>> correct what I pulled from the master repo is unchanged on checkout. And
>> the two spurious files seem OK for me and 9 require updating.
>> 
>> I can try it again on MacOS later. Maybe something is different there
>> and this is very platform specific.
>> 
>>> 
>>>>> However it looks like the commit message will show huge diffs for each file.
>>>>> 
>>>>> Is that unavoidable?
>>>> The diff is done line-by-line. So if each line changes then it is a big
>>>> diff. I don't know a way around that.
>>>> 
>>>> The alternative would be to leave the .gitattributes file and not commit
>>>> the normalised files. The next time someone commits each of the
>>>> offending files the normalisation will occur as git sends it back to the
>>>> repo. So this just delays the big diff. At least if it all done at once
>>>> then it makes more sense and avoids the issue of a big diff occurring
>>>> some time in the future and someone has to figure it out all over again.
>>> Agreed it's best done all at once.
>>> 
>>> I remember fixing EOLs on SVN but as I recall it did not create the
>>> huge diffs so long as it was done on the appropriate OS.
>>> Maybe doing it on Windows won't cause the diffs to be created? I may
>>> be able to try that later.
>> 
>> Since windows is the culprit for the CRLF endings it makes sense to try.
> 
> Using Windows does not seem to help; git show shows all lines as different.

It was worth a try.

I saw the EOL commit. Are you going to commit the .gitattibutes file as well? I’m indifferent on this. It is recommended for any project which expects contributions from multiple platforms. It was done on [statistics]. On one side it will stop anyone committing new files with CRLF. On the other side Windows users of git should set their core.autocrlf property globally to prevent this.

> 
>> In this case if you create the .gitattributes file (or configure
>> core.autocrlf) git will know to send the file back to the repo
>> normalised. So you may have to edit each of the offending files with a
>> trivial change to force a commit. The diff should then be the trivial
>> change you made and not the big diff with all the lines.
>> 
>> I don't know what happens on the server side. If you do it in a branch
>> in Github you could compare the two side by side. Either it will show
>> the trivial change or the big diff because on the server side the CRLF
>> was changed and locally (on windows) it was not.
>> 
>>> 
>> [1] https://help.github.com/en/articles/dealing-with-line-endings
>>>> [1]
>>>> https://stackoverflow.com/questions/73833/how-do-you-search-for-files-containing-dos-line-endings-crlf-with-grep-on-linu
>>>> 
>>>>>> [1] https://git-scm.com/docs/gitattributes
>>>>>> 
>>>>>> 
>>>>>>>> Best regards, Julian
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org <ma...@commons.apache.org>
>> For additional commands, e-mail: dev-help@commons.apache.org <ma...@commons.apache.org>
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org <ma...@commons.apache.org>
> For additional commands, e-mail: dev-help@commons.apache.org <ma...@commons.apache.org>

Re: [CODEC] CRLF files in macOS checkout

Posted by sebb <se...@gmail.com>.
On Tue, 18 Jun 2019 at 16:01, Alex Herbert <al...@gmail.com> wrote:
>
>
> On 18/06/2019 15:38, sebb wrote:
> > On Tue, 18 Jun 2019 at 12:58, Alex Herbert <al...@gmail.com> wrote:
> >>
> >> On 18/06/2019 11:00, sebb wrote:
> >>> On Tue, 18 Jun 2019 at 10:40, Alex Herbert <al...@gmail.com> wrote:
> >>>> On 18/06/2019 09:55, sebb wrote:
> >>>>> On Tue, 18 Jun 2019 at 08:15, Julian Reschke <ju...@gmx.de> wrote:
> >>>>>> On 17.06.2019 23:26, sebb wrote:
> >>>>>>> Most of the files in my clone of codec have LF endings, however a few are CRLF:
> >>>>>>>
> >>>>>>> ./README.md
> >>>>>>> ./src/assembly/bin.xml
> >>>>>>> ./src/assembly/src.xml
> >>>>>>> ./src/changes/changes.xml
> >>>>>>> ./src/main/java/org/apache/commons/codec/cli/Digest.java
> >>>>>>> ./src/main/java/org/apache/commons/codec/language/DaitchMokotoffSoundex.java
> >>>>>>> ./src/main/resources/org/apache/commons/codec/language/bm/lang.txt
> >>>>>>> ./src/test/java/org/apache/commons/codec/digest/HmacAlgorithmsTest.java
> >>>>>>> ./src/test/java/org/apache/commons/codec/digest/MessageDigestAlgorithmsTest.java
> >>>>>>> ./src/test/java/org/apache/commons/codec/digest/PureJavaCrc32Test.java
> >>>>>>> ./src/test/java/org/apache/commons/codec/language/ColognePhoneticTest.java
> >>>>>>>
> >>>>>>>
> >>>>>>> This causes spurious differences when the files are updated.
> >>>>>>>
> >>>>>>> Can these files be easily fixed without causing huge diffs to be generated?
> >>>>>>>
> >>>>>>> Also, is there any way to prevent such files being committed to the repo?
> >>>>>>>
> >>>>>>> S.
> >>>>>> If svn:eol-style is set to "native", it shouldn't matter. I think this
> >>>>>> can be defaulted for newly added files.
> >>>>> Thanks, but this is Git, not SVN.
> >>>>>
> >>>>>> In Jackrabbit, I regularly run a script to spot new files missing the
> >>>>>> property.
> >>>>> Are you willing to share the script?
> >>>> This was recently a problem in [statistics]. It was fixed using a
> >>>> .gitattributes file [1] containing:
> >>>>
> >>>> * text=auto
> >>>>
> >>>> You can fix all the existing files following the steps detailed on the
> >>>> git documentation:
> >>>>
> >>>> $ echo "* text=auto" >.gitattributes
> >>>>
> >>>> $ git add --renormalize .
> >>>>
> >>>> $ git status        # Show files that will be normalized
> >>>>
> >>>> $ git commit -m "Introduce end-of-line normalization"
> >>> Thanks, though that did not pick up two of the files.
> >> Oh dear.
> >>
> >> When I tried this locally it misses from your list:
> >>
> >> ./src/changes/changes.xml
> >> ./src/test/java/org/apache/commons/codec/language/ColognePhoneticTest.java
> >>
> >> Those files are also ignored on my machine (linux) by dos2unix. They are
> >> not found by any of the following [1]:
> >>
> >> $ grep -IUr --color "^M" src
> >> $ find src -type f | xargs file | grep CRLF
> >> $ grep -IUlr $'\r' src
> >>
> >> So are they a problem?
> > I don't know if this causes an issue.
> >
> > I used file on macOS to detect the problem files.
> > Also my editor (BBEdit) shows the EOL as CRLF for them.

I've since recloned the repo, and those 2 files don't have CRLF endings.
Something must have been confused in my workspace.

> I am currently on linux. I don't have any settings for line endings
> configured for git [1], i.e. the core.autocrlf property. So if I am
> correct what I pulled from the master repo is unchanged on checkout. And
> the two spurious files seem OK for me and 9 require updating.
>
> I can try it again on MacOS later. Maybe something is different there
> and this is very platform specific.
>
> >
> >>> However it looks like the commit message will show huge diffs for each file.
> >>>
> >>> Is that unavoidable?
> >> The diff is done line-by-line. So if each line changes then it is a big
> >> diff. I don't know a way around that.
> >>
> >> The alternative would be to leave the .gitattributes file and not commit
> >> the normalised files. The next time someone commits each of the
> >> offending files the normalisation will occur as git sends it back to the
> >> repo. So this just delays the big diff. At least if it all done at once
> >> then it makes more sense and avoids the issue of a big diff occurring
> >> some time in the future and someone has to figure it out all over again.
> > Agreed it's best done all at once.
> >
> > I remember fixing EOLs on SVN but as I recall it did not create the
> > huge diffs so long as it was done on the appropriate OS.
> > Maybe doing it on Windows won't cause the diffs to be created? I may
> > be able to try that later.
>
> Since windows is the culprit for the CRLF endings it makes sense to try.

Using Windows does not seem to help; git show shows all lines as different.

> In this case if you create the .gitattributes file (or configure
> core.autocrlf) git will know to send the file back to the repo
> normalised. So you may have to edit each of the offending files with a
> trivial change to force a commit. The diff should then be the trivial
> change you made and not the big diff with all the lines.
>
> I don't know what happens on the server side. If you do it in a branch
> in Github you could compare the two side by side. Either it will show
> the trivial change or the big diff because on the server side the CRLF
> was changed and locally (on windows) it was not.
>
> >
> [1] https://help.github.com/en/articles/dealing-with-line-endings
> >> [1]
> >> https://stackoverflow.com/questions/73833/how-do-you-search-for-files-containing-dos-line-endings-crlf-with-grep-on-linu
> >>
> >>>> [1] https://git-scm.com/docs/gitattributes
> >>>>
> >>>>
> >>>>>> Best regards, Julian
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [CODEC] CRLF files in macOS checkout

Posted by Alex Herbert <al...@gmail.com>.
On 18/06/2019 15:38, sebb wrote:
> On Tue, 18 Jun 2019 at 12:58, Alex Herbert <al...@gmail.com> wrote:
>>
>> On 18/06/2019 11:00, sebb wrote:
>>> On Tue, 18 Jun 2019 at 10:40, Alex Herbert <al...@gmail.com> wrote:
>>>> On 18/06/2019 09:55, sebb wrote:
>>>>> On Tue, 18 Jun 2019 at 08:15, Julian Reschke <ju...@gmx.de> wrote:
>>>>>> On 17.06.2019 23:26, sebb wrote:
>>>>>>> Most of the files in my clone of codec have LF endings, however a few are CRLF:
>>>>>>>
>>>>>>> ./README.md
>>>>>>> ./src/assembly/bin.xml
>>>>>>> ./src/assembly/src.xml
>>>>>>> ./src/changes/changes.xml
>>>>>>> ./src/main/java/org/apache/commons/codec/cli/Digest.java
>>>>>>> ./src/main/java/org/apache/commons/codec/language/DaitchMokotoffSoundex.java
>>>>>>> ./src/main/resources/org/apache/commons/codec/language/bm/lang.txt
>>>>>>> ./src/test/java/org/apache/commons/codec/digest/HmacAlgorithmsTest.java
>>>>>>> ./src/test/java/org/apache/commons/codec/digest/MessageDigestAlgorithmsTest.java
>>>>>>> ./src/test/java/org/apache/commons/codec/digest/PureJavaCrc32Test.java
>>>>>>> ./src/test/java/org/apache/commons/codec/language/ColognePhoneticTest.java
>>>>>>>
>>>>>>>
>>>>>>> This causes spurious differences when the files are updated.
>>>>>>>
>>>>>>> Can these files be easily fixed without causing huge diffs to be generated?
>>>>>>>
>>>>>>> Also, is there any way to prevent such files being committed to the repo?
>>>>>>>
>>>>>>> S.
>>>>>> If svn:eol-style is set to "native", it shouldn't matter. I think this
>>>>>> can be defaulted for newly added files.
>>>>> Thanks, but this is Git, not SVN.
>>>>>
>>>>>> In Jackrabbit, I regularly run a script to spot new files missing the
>>>>>> property.
>>>>> Are you willing to share the script?
>>>> This was recently a problem in [statistics]. It was fixed using a
>>>> .gitattributes file [1] containing:
>>>>
>>>> * text=auto
>>>>
>>>> You can fix all the existing files following the steps detailed on the
>>>> git documentation:
>>>>
>>>> $ echo "* text=auto" >.gitattributes
>>>>
>>>> $ git add --renormalize .
>>>>
>>>> $ git status        # Show files that will be normalized
>>>>
>>>> $ git commit -m "Introduce end-of-line normalization"
>>> Thanks, though that did not pick up two of the files.
>> Oh dear.
>>
>> When I tried this locally it misses from your list:
>>
>> ./src/changes/changes.xml
>> ./src/test/java/org/apache/commons/codec/language/ColognePhoneticTest.java
>>
>> Those files are also ignored on my machine (linux) by dos2unix. They are
>> not found by any of the following [1]:
>>
>> $ grep -IUr --color "^M" src
>> $ find src -type f | xargs file | grep CRLF
>> $ grep -IUlr $'\r' src
>>
>> So are they a problem?
> I don't know if this causes an issue.
>
> I used file on macOS to detect the problem files.
> Also my editor (BBEdit) shows the EOL as CRLF for them.

I am currently on linux. I don't have any settings for line endings 
configured for git [1], i.e. the core.autocrlf property. So if I am 
correct what I pulled from the master repo is unchanged on checkout. And 
the two spurious files seem OK for me and 9 require updating.

I can try it again on MacOS later. Maybe something is different there 
and this is very platform specific.

>
>>> However it looks like the commit message will show huge diffs for each file.
>>>
>>> Is that unavoidable?
>> The diff is done line-by-line. So if each line changes then it is a big
>> diff. I don't know a way around that.
>>
>> The alternative would be to leave the .gitattributes file and not commit
>> the normalised files. The next time someone commits each of the
>> offending files the normalisation will occur as git sends it back to the
>> repo. So this just delays the big diff. At least if it all done at once
>> then it makes more sense and avoids the issue of a big diff occurring
>> some time in the future and someone has to figure it out all over again.
> Agreed it's best done all at once.
>
> I remember fixing EOLs on SVN but as I recall it did not create the
> huge diffs so long as it was done on the appropriate OS.
> Maybe doing it on Windows won't cause the diffs to be created? I may
> be able to try that later.

Since windows is the culprit for the CRLF endings it makes sense to try. 
In this case if you create the .gitattributes file (or configure 
core.autocrlf) git will know to send the file back to the repo 
normalised. So you may have to edit each of the offending files with a 
trivial change to force a commit. The diff should then be the trivial 
change you made and not the big diff with all the lines.

I don't know what happens on the server side. If you do it in a branch 
in Github you could compare the two side by side. Either it will show 
the trivial change or the big diff because on the server side the CRLF 
was changed and locally (on windows) it was not.

>
[1] https://help.github.com/en/articles/dealing-with-line-endings
>> [1]
>> https://stackoverflow.com/questions/73833/how-do-you-search-for-files-containing-dos-line-endings-crlf-with-grep-on-linu
>>
>>>> [1] https://git-scm.com/docs/gitattributes
>>>>
>>>>
>>>>>> Best regards, Julian
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [CODEC] CRLF files in macOS checkout

Posted by sebb <se...@gmail.com>.
On Tue, 18 Jun 2019 at 12:58, Alex Herbert <al...@gmail.com> wrote:
>
>
> On 18/06/2019 11:00, sebb wrote:
> > On Tue, 18 Jun 2019 at 10:40, Alex Herbert <al...@gmail.com> wrote:
> >>
> >> On 18/06/2019 09:55, sebb wrote:
> >>> On Tue, 18 Jun 2019 at 08:15, Julian Reschke <ju...@gmx.de> wrote:
> >>>> On 17.06.2019 23:26, sebb wrote:
> >>>>> Most of the files in my clone of codec have LF endings, however a few are CRLF:
> >>>>>
> >>>>> ./README.md
> >>>>> ./src/assembly/bin.xml
> >>>>> ./src/assembly/src.xml
> >>>>> ./src/changes/changes.xml
> >>>>> ./src/main/java/org/apache/commons/codec/cli/Digest.java
> >>>>> ./src/main/java/org/apache/commons/codec/language/DaitchMokotoffSoundex.java
> >>>>> ./src/main/resources/org/apache/commons/codec/language/bm/lang.txt
> >>>>> ./src/test/java/org/apache/commons/codec/digest/HmacAlgorithmsTest.java
> >>>>> ./src/test/java/org/apache/commons/codec/digest/MessageDigestAlgorithmsTest.java
> >>>>> ./src/test/java/org/apache/commons/codec/digest/PureJavaCrc32Test.java
> >>>>> ./src/test/java/org/apache/commons/codec/language/ColognePhoneticTest.java
> >>>>>
> >>>>>
> >>>>> This causes spurious differences when the files are updated.
> >>>>>
> >>>>> Can these files be easily fixed without causing huge diffs to be generated?
> >>>>>
> >>>>> Also, is there any way to prevent such files being committed to the repo?
> >>>>>
> >>>>> S.
> >>>> If svn:eol-style is set to "native", it shouldn't matter. I think this
> >>>> can be defaulted for newly added files.
> >>> Thanks, but this is Git, not SVN.
> >>>
> >>>> In Jackrabbit, I regularly run a script to spot new files missing the
> >>>> property.
> >>> Are you willing to share the script?
> >> This was recently a problem in [statistics]. It was fixed using a
> >> .gitattributes file [1] containing:
> >>
> >> * text=auto
> >>
> >> You can fix all the existing files following the steps detailed on the
> >> git documentation:
> >>
> >> $ echo "* text=auto" >.gitattributes
> >>
> >> $ git add --renormalize .
> >>
> >> $ git status        # Show files that will be normalized
> >>
> >> $ git commit -m "Introduce end-of-line normalization"
> > Thanks, though that did not pick up two of the files.
>
> Oh dear.
>
> When I tried this locally it misses from your list:
>
> ./src/changes/changes.xml
> ./src/test/java/org/apache/commons/codec/language/ColognePhoneticTest.java
>
> Those files are also ignored on my machine (linux) by dos2unix. They are
> not found by any of the following [1]:
>
> $ grep -IUr --color "^M" src
> $ find src -type f | xargs file | grep CRLF
> $ grep -IUlr $'\r' src
>
> So are they a problem?

I don't know if this causes an issue.

I used file on macOS to detect the problem files.
Also my editor (BBEdit) shows the EOL as CRLF for them.

> >
> > However it looks like the commit message will show huge diffs for each file.
> >
> > Is that unavoidable?
>
> The diff is done line-by-line. So if each line changes then it is a big
> diff. I don't know a way around that.
>
> The alternative would be to leave the .gitattributes file and not commit
> the normalised files. The next time someone commits each of the
> offending files the normalisation will occur as git sends it back to the
> repo. So this just delays the big diff. At least if it all done at once
> then it makes more sense and avoids the issue of a big diff occurring
> some time in the future and someone has to figure it out all over again.

Agreed it's best done all at once.

I remember fixing EOLs on SVN but as I recall it did not create the
huge diffs so long as it was done on the appropriate OS.
Maybe doing it on Windows won't cause the diffs to be created? I may
be able to try that later.

> [1]
> https://stackoverflow.com/questions/73833/how-do-you-search-for-files-containing-dos-line-endings-crlf-with-grep-on-linu
>
> >
> >> [1] https://git-scm.com/docs/gitattributes
> >>
> >>
> >>>> Best regards, Julian
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [CODEC] CRLF files in macOS checkout

Posted by Alex Herbert <al...@gmail.com>.
On 18/06/2019 11:00, sebb wrote:
> On Tue, 18 Jun 2019 at 10:40, Alex Herbert <al...@gmail.com> wrote:
>>
>> On 18/06/2019 09:55, sebb wrote:
>>> On Tue, 18 Jun 2019 at 08:15, Julian Reschke <ju...@gmx.de> wrote:
>>>> On 17.06.2019 23:26, sebb wrote:
>>>>> Most of the files in my clone of codec have LF endings, however a few are CRLF:
>>>>>
>>>>> ./README.md
>>>>> ./src/assembly/bin.xml
>>>>> ./src/assembly/src.xml
>>>>> ./src/changes/changes.xml
>>>>> ./src/main/java/org/apache/commons/codec/cli/Digest.java
>>>>> ./src/main/java/org/apache/commons/codec/language/DaitchMokotoffSoundex.java
>>>>> ./src/main/resources/org/apache/commons/codec/language/bm/lang.txt
>>>>> ./src/test/java/org/apache/commons/codec/digest/HmacAlgorithmsTest.java
>>>>> ./src/test/java/org/apache/commons/codec/digest/MessageDigestAlgorithmsTest.java
>>>>> ./src/test/java/org/apache/commons/codec/digest/PureJavaCrc32Test.java
>>>>> ./src/test/java/org/apache/commons/codec/language/ColognePhoneticTest.java
>>>>>
>>>>>
>>>>> This causes spurious differences when the files are updated.
>>>>>
>>>>> Can these files be easily fixed without causing huge diffs to be generated?
>>>>>
>>>>> Also, is there any way to prevent such files being committed to the repo?
>>>>>
>>>>> S.
>>>> If svn:eol-style is set to "native", it shouldn't matter. I think this
>>>> can be defaulted for newly added files.
>>> Thanks, but this is Git, not SVN.
>>>
>>>> In Jackrabbit, I regularly run a script to spot new files missing the
>>>> property.
>>> Are you willing to share the script?
>> This was recently a problem in [statistics]. It was fixed using a
>> .gitattributes file [1] containing:
>>
>> * text=auto
>>
>> You can fix all the existing files following the steps detailed on the
>> git documentation:
>>
>> $ echo "* text=auto" >.gitattributes
>>
>> $ git add --renormalize .
>>
>> $ git status        # Show files that will be normalized
>>
>> $ git commit -m "Introduce end-of-line normalization"
> Thanks, though that did not pick up two of the files.

Oh dear.

When I tried this locally it misses from your list:

./src/changes/changes.xml
./src/test/java/org/apache/commons/codec/language/ColognePhoneticTest.java

Those files are also ignored on my machine (linux) by dos2unix. They are 
not found by any of the following [1]:

$ grep -IUr --color "^M" src
$ find src -type f | xargs file | grep CRLF
$ grep -IUlr $'\r' src

So are they a problem?

>
> However it looks like the commit message will show huge diffs for each file.
>
> Is that unavoidable?

The diff is done line-by-line. So if each line changes then it is a big 
diff. I don't know a way around that.

The alternative would be to leave the .gitattributes file and not commit 
the normalised files. The next time someone commits each of the 
offending files the normalisation will occur as git sends it back to the 
repo. So this just delays the big diff. At least if it all done at once 
then it makes more sense and avoids the issue of a big diff occurring 
some time in the future and someone has to figure it out all over again.

[1] 
https://stackoverflow.com/questions/73833/how-do-you-search-for-files-containing-dos-line-endings-crlf-with-grep-on-linu

>
>> [1] https://git-scm.com/docs/gitattributes
>>
>>
>>>> Best regards, Julian
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

Re: [CODEC] CRLF files in macOS checkout

Posted by sebb <se...@gmail.com>.
On Tue, 18 Jun 2019 at 10:40, Alex Herbert <al...@gmail.com> wrote:
>
>
> On 18/06/2019 09:55, sebb wrote:
> > On Tue, 18 Jun 2019 at 08:15, Julian Reschke <ju...@gmx.de> wrote:
> >> On 17.06.2019 23:26, sebb wrote:
> >>> Most of the files in my clone of codec have LF endings, however a few are CRLF:
> >>>
> >>> ./README.md
> >>> ./src/assembly/bin.xml
> >>> ./src/assembly/src.xml
> >>> ./src/changes/changes.xml
> >>> ./src/main/java/org/apache/commons/codec/cli/Digest.java
> >>> ./src/main/java/org/apache/commons/codec/language/DaitchMokotoffSoundex.java
> >>> ./src/main/resources/org/apache/commons/codec/language/bm/lang.txt
> >>> ./src/test/java/org/apache/commons/codec/digest/HmacAlgorithmsTest.java
> >>> ./src/test/java/org/apache/commons/codec/digest/MessageDigestAlgorithmsTest.java
> >>> ./src/test/java/org/apache/commons/codec/digest/PureJavaCrc32Test.java
> >>> ./src/test/java/org/apache/commons/codec/language/ColognePhoneticTest.java
> >>>
> >>>
> >>> This causes spurious differences when the files are updated.
> >>>
> >>> Can these files be easily fixed without causing huge diffs to be generated?
> >>>
> >>> Also, is there any way to prevent such files being committed to the repo?
> >>>
> >>> S.
> >> If svn:eol-style is set to "native", it shouldn't matter. I think this
> >> can be defaulted for newly added files.
> > Thanks, but this is Git, not SVN.
> >
> >> In Jackrabbit, I regularly run a script to spot new files missing the
> >> property.
> > Are you willing to share the script?
>
> This was recently a problem in [statistics]. It was fixed using a
> .gitattributes file [1] containing:
>
> * text=auto
>
> You can fix all the existing files following the steps detailed on the
> git documentation:
>
> $ echo "* text=auto" >.gitattributes
>
> $ git add --renormalize .
>
> $ git status        # Show files that will be normalized
>
> $ git commit -m "Introduce end-of-line normalization"

Thanks, though that did not pick up two of the files.

However it looks like the commit message will show huge diffs for each file.

Is that unavoidable?

> [1] https://git-scm.com/docs/gitattributes
>
>
> >
> >> Best regards, Julian
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [CODEC] CRLF files in macOS checkout

Posted by Alex Herbert <al...@gmail.com>.
On 18/06/2019 09:55, sebb wrote:
> On Tue, 18 Jun 2019 at 08:15, Julian Reschke <ju...@gmx.de> wrote:
>> On 17.06.2019 23:26, sebb wrote:
>>> Most of the files in my clone of codec have LF endings, however a few are CRLF:
>>>
>>> ./README.md
>>> ./src/assembly/bin.xml
>>> ./src/assembly/src.xml
>>> ./src/changes/changes.xml
>>> ./src/main/java/org/apache/commons/codec/cli/Digest.java
>>> ./src/main/java/org/apache/commons/codec/language/DaitchMokotoffSoundex.java
>>> ./src/main/resources/org/apache/commons/codec/language/bm/lang.txt
>>> ./src/test/java/org/apache/commons/codec/digest/HmacAlgorithmsTest.java
>>> ./src/test/java/org/apache/commons/codec/digest/MessageDigestAlgorithmsTest.java
>>> ./src/test/java/org/apache/commons/codec/digest/PureJavaCrc32Test.java
>>> ./src/test/java/org/apache/commons/codec/language/ColognePhoneticTest.java
>>>
>>>
>>> This causes spurious differences when the files are updated.
>>>
>>> Can these files be easily fixed without causing huge diffs to be generated?
>>>
>>> Also, is there any way to prevent such files being committed to the repo?
>>>
>>> S.
>> If svn:eol-style is set to "native", it shouldn't matter. I think this
>> can be defaulted for newly added files.
> Thanks, but this is Git, not SVN.
>
>> In Jackrabbit, I regularly run a script to spot new files missing the
>> property.
> Are you willing to share the script?

This was recently a problem in [statistics]. It was fixed using a 
.gitattributes file [1] containing:

* text=auto

You can fix all the existing files following the steps detailed on the 
git documentation:

$ echo "* text=auto" >.gitattributes

$ git add --renormalize .

$ git status        # Show files that will be normalized

$ git commit -m "Introduce end-of-line normalization"

[1] https://git-scm.com/docs/gitattributes


>
>> Best regards, Julian
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

Re: [CODEC] CRLF files in macOS checkout

Posted by Julian Reschke <ju...@gmx.de>.
On 18.06.2019 10:55, sebb wrote:
> ...
>> If svn:eol-style is set to "native", it shouldn't matter. I think this
>> can be defaulted for newly added files.
>
> Thanks, but this is Git, not SVN.

Ah. I don't think Git can help here.

>> In Jackrabbit, I regularly run a script to spot new files missing the
>> property.
>
> Are you willing to share the script?

Yes, but it's specific to SVN:

-- snip --
find . "(" -name "*.js" -o -name "*.java" -o -name "*.xml" -o -name
"*.groovy" -o -name "*.md" -o -name "*.html" -o -name "*.css" -o -name
"*.cfg" -o -name package-list -o -name "*.xslt" ")" -exec svn propset
svn:eol-style native {} ";"
-- snip --

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [CODEC] CRLF files in macOS checkout

Posted by sebb <se...@gmail.com>.
On Tue, 18 Jun 2019 at 08:15, Julian Reschke <ju...@gmx.de> wrote:
>
> On 17.06.2019 23:26, sebb wrote:
> > Most of the files in my clone of codec have LF endings, however a few are CRLF:
> >
> > ./README.md
> > ./src/assembly/bin.xml
> > ./src/assembly/src.xml
> > ./src/changes/changes.xml
> > ./src/main/java/org/apache/commons/codec/cli/Digest.java
> > ./src/main/java/org/apache/commons/codec/language/DaitchMokotoffSoundex.java
> > ./src/main/resources/org/apache/commons/codec/language/bm/lang.txt
> > ./src/test/java/org/apache/commons/codec/digest/HmacAlgorithmsTest.java
> > ./src/test/java/org/apache/commons/codec/digest/MessageDigestAlgorithmsTest.java
> > ./src/test/java/org/apache/commons/codec/digest/PureJavaCrc32Test.java
> > ./src/test/java/org/apache/commons/codec/language/ColognePhoneticTest.java
> >
> >
> > This causes spurious differences when the files are updated.
> >
> > Can these files be easily fixed without causing huge diffs to be generated?
> >
> > Also, is there any way to prevent such files being committed to the repo?
> >
> > S.
>
> If svn:eol-style is set to "native", it shouldn't matter. I think this
> can be defaulted for newly added files.

Thanks, but this is Git, not SVN.

> In Jackrabbit, I regularly run a script to spot new files missing the
> property.

Are you willing to share the script?

> Best regards, Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [CODEC] CRLF files in macOS checkout

Posted by Julian Reschke <ju...@gmx.de>.
On 17.06.2019 23:26, sebb wrote:
> Most of the files in my clone of codec have LF endings, however a few are CRLF:
>
> ./README.md
> ./src/assembly/bin.xml
> ./src/assembly/src.xml
> ./src/changes/changes.xml
> ./src/main/java/org/apache/commons/codec/cli/Digest.java
> ./src/main/java/org/apache/commons/codec/language/DaitchMokotoffSoundex.java
> ./src/main/resources/org/apache/commons/codec/language/bm/lang.txt
> ./src/test/java/org/apache/commons/codec/digest/HmacAlgorithmsTest.java
> ./src/test/java/org/apache/commons/codec/digest/MessageDigestAlgorithmsTest.java
> ./src/test/java/org/apache/commons/codec/digest/PureJavaCrc32Test.java
> ./src/test/java/org/apache/commons/codec/language/ColognePhoneticTest.java
>
>
> This causes spurious differences when the files are updated.
>
> Can these files be easily fixed without causing huge diffs to be generated?
>
> Also, is there any way to prevent such files being committed to the repo?
>
> S.

If svn:eol-style is set to "native", it shouldn't matter. I think this
can be defaulted for newly added files.

In Jackrabbit, I regularly run a script to spot new files missing the
property.

Best regards, Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org