You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Torsten Curdt <tc...@vafer.org> on 2018/01/03 15:46:55 UTC

[compress] TarArchiveEntry and Windows drive letter

I just found a new issue with compress.

It's the "normalizeFileName" in TarArchiveEntry again.
On Windows it just strips the drive letter

https://github.com/apache/commons-compress/blob/master/src/
main/java/org/apache/commons/compress/archivers/tar/
TarArchiveEntry.java#L1337

which I think is a questionable default behaviour IMO.

   TarArchiveEntry("C:\foo\bar") -> "/foo/bar"
   TarArchiveEntry("D:\foo\bar") -> "/foo/bar"

Is that in line with how other unix tool ports handle things on windows?

cheers,
Torsten

PS: Just for you sebb ;)

Re: [compress] TarArchiveEntry and Windows drive letter

Posted by Stefan Bodewig <bo...@apache.org>.
On 2018-01-28, Torsten Curdt wrote:

>>> This is what I suggested. Change the meaning of "preserveLeadingSlashes"
>>> to "preserveAbsolutePath" and use it to keep the drive letter on
>>> Windows.

>>> The current code will always strip the drive letter spec.

>> In that case, I agree with you.

>> Also as a name,  'preserveLeadingSlashes' only makes sense on a unix
>> system, and it describes the processing rather than the functionality.

> I guess that could be a reasonable compromise - and certainly much clearer
> from an API point of view.

Done.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] TarArchiveEntry and Windows drive letter

Posted by Torsten Curdt <tc...@vafer.org>.
> > This is what I suggested. Change the meaning of "preserveLeadingSlashes"
> > to "preserveAbsolutePath" and use it to keep the drive letter on
> > Windows.
> >
> > The current code will always strip the drive letter spec.
>
> In that case, I agree with you.
>
> Also as a name,  'preserveLeadingSlashes' only makes sense on a unix
> system, and it describes the processing rather than the functionality.
>

I guess that could be a reasonable compromise - and certainly much clearer
from an API point of view.

cheers,
Torsten

Re: [compress] TarArchiveEntry and Windows drive letter

Posted by sebb <se...@gmail.com>.
On 22 January 2018 at 09:53, Stefan Bodewig <bo...@apache.org> wrote:
> On 2018-01-21, sebb wrote:
>
>> On 21 January 2018 at 16:07, Stefan Bodewig <bo...@apache.org> wrote:
>
>>> The contract of tar archives is they contain relative path names. GNU
>>> tar strips leading slashes both when creating archives and when
>>> extracting archives who's entry names contain leading slashes. Sticking
>>> with that contract I still believe we should remove leading slashes by
>>> default - and removing the drive letters is the Windows equivalent of it
>>> IMHO. That's why I suggest to keep the drive letter in the
>>> preserveLeadingSlashes == true case.
>
>> I would expect the Windows default behaviour to generate/use relative
>> names only.
>> This means dropping both the drive letter *and* any leading \ or /.
>
> This is what happens right now.
>
>> Optionally, tar should be able to preserve/use absolute path names.
>> But the user must specify this.
>
> This is what I suggested. Change the meaning of "preserveLeadingSlashes"
> to "preserveAbsolutePath" and use it to keep the drive letter on
> Windows.
>
> The current code will always strip the drive letter spec.

In that case, I agree with you.

Also as a name,  'preserveLeadingSlashes' only makes sense on a unix
system, and it describes the processing rather than the functionality.

> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] TarArchiveEntry and Windows drive letter

Posted by Stefan Bodewig <bo...@apache.org>.
On 2018-01-21, sebb wrote:

> On 21 January 2018 at 16:07, Stefan Bodewig <bo...@apache.org> wrote:

>> The contract of tar archives is they contain relative path names. GNU
>> tar strips leading slashes both when creating archives and when
>> extracting archives who's entry names contain leading slashes. Sticking
>> with that contract I still believe we should remove leading slashes by
>> default - and removing the drive letters is the Windows equivalent of it
>> IMHO. That's why I suggest to keep the drive letter in the
>> preserveLeadingSlashes == true case.

> I would expect the Windows default behaviour to generate/use relative
> names only.
> This means dropping both the drive letter *and* any leading \ or /.

This is what happens right now.

> Optionally, tar should be able to preserve/use absolute path names.
> But the user must specify this.

This is what I suggested. Change the meaning of "preserveLeadingSlashes"
to "preserveAbsolutePath" and use it to keep the drive letter on
Windows.

The current code will always strip the drive letter spec.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] TarArchiveEntry and Windows drive letter

Posted by sebb <se...@gmail.com>.
On 21 January 2018 at 16:07, Stefan Bodewig <bo...@apache.org> wrote:
> On 2018-01-05, Torsten Curdt wrote:
>
>>>>    TarArchiveEntry("C:\foo\bar") -> "/foo/bar"
>>>>    TarArchiveEntry("D:\foo\bar") -> "/foo/bar"
>
>>> "strip the drive letter" predates preserveLeadingSlashes
>
>> Might be but I guess the only question is how to fix it without breaking
>> too many things.
>> I am not such a big fan of "preserveLeadingSlashes" either.
>
> We inherited that from Ant and it really stems from a rather odd usecase
> over there.
>
>>> I could be convinced that we should only strip drive letters when we
>>> don't preserve leading slashes either :-)
>
>> IMO there is just no case at all when compress should strip drive letters.
>> Sure it might be convenient (for some) but this is too much magic.
>
> Again, I have no idea what tar ports do on Windows. By now all my
> Windows instances run the Linux subsystem if I want to use tar.
>
> This is what GNU tar does on Linux:
>
> $ tar cf x.tar /tmp
> tar: Removing leading `/' from member names
>
> so us stripping leading slashes by default is completely in line with
> what the CLI tar does.
>
>>> Usually tar archives do not contain absolute path names at all.
>
>> Which is also why the current constructors have a huge potential of
>> ambiguity and misinterpretation IMO.
>
> I've just now changed the javadocs to explicitly state how names get
> constructed. I've left out documenting what happens to drive letters so
> I don't need to change it depending on the outcome of this discussion :-)
>
>> If it should be relative and you pass a File - relative to what?
>
> It is File.getPath() so it is the string argument passed to the File
> constructor.
>
>> It might be worth checking with Windows users to see what the expected
>> behaviour might be.
>
> Allow me to point out that this has been the behavior of the class for
> far more than ten years and no Windows user has ever complained.
> Neither here nor in Ant land. This may indicate the behavior is not too
> surprising. It may also indicate that most people really only want to
> use relative file names regardless of platform.

Any other behaviour is potentially dangerous so must be optional.

>> But even then - I still think silently stripping is the wrong thing to
>> do.
>
> The contract of tar archives is they contain relative path names. GNU
> tar strips leading slashes both when creating archives and when
> extracting archives who's entry names contain leading slashes. Sticking
> with that contract I still believe we should remove leading slashes by
> default - and removing the drive letters is the Windows equivalent of it
> IMHO. That's why I suggest to keep the drive letter in the
> preserveLeadingSlashes == true case.

I would expect the Windows default behaviour to generate/use relative
names only.
This means dropping both the drive letter *and* any leading \ or /.

Optionally, tar should be able to preserve/use absolute path names.
But the user must specify this.

> Cheers
>
>         Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] TarArchiveEntry and Windows drive letter

Posted by Stefan Bodewig <bo...@apache.org>.
On 2018-01-05, Torsten Curdt wrote:

>>>    TarArchiveEntry("C:\foo\bar") -> "/foo/bar"
>>>    TarArchiveEntry("D:\foo\bar") -> "/foo/bar"

>> "strip the drive letter" predates preserveLeadingSlashes

> Might be but I guess the only question is how to fix it without breaking
> too many things.
> I am not such a big fan of "preserveLeadingSlashes" either.

We inherited that from Ant and it really stems from a rather odd usecase
over there.

>> I could be convinced that we should only strip drive letters when we
>> don't preserve leading slashes either :-)

> IMO there is just no case at all when compress should strip drive letters.
> Sure it might be convenient (for some) but this is too much magic.

Again, I have no idea what tar ports do on Windows. By now all my
Windows instances run the Linux subsystem if I want to use tar.

This is what GNU tar does on Linux:

$ tar cf x.tar /tmp
tar: Removing leading `/' from member names

so us stripping leading slashes by default is completely in line with
what the CLI tar does.

>> Usually tar archives do not contain absolute path names at all.

> Which is also why the current constructors have a huge potential of
> ambiguity and misinterpretation IMO.

I've just now changed the javadocs to explicitly state how names get
constructed. I've left out documenting what happens to drive letters so
I don't need to change it depending on the outcome of this discussion :-)

> If it should be relative and you pass a File - relative to what?

It is File.getPath() so it is the string argument passed to the File
constructor.

> It might be worth checking with Windows users to see what the expected
> behaviour might be.

Allow me to point out that this has been the behavior of the class for
far more than ten years and no Windows user has ever complained.
Neither here nor in Ant land. This may indicate the behavior is not too
surprising. It may also indicate that most people really only want to
use relative file names regardless of platform.

> But even then - I still think silently stripping is the wrong thing to
> do.

The contract of tar archives is they contain relative path names. GNU
tar strips leading slashes both when creating archives and when
extracting archives who's entry names contain leading slashes. Sticking
with that contract I still believe we should remove leading slashes by
default - and removing the drive letters is the Windows equivalent of it
IMHO. That's why I suggest to keep the drive letter in the
preserveLeadingSlashes == true case.

Cheers

        Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] TarArchiveEntry and Windows drive letter

Posted by Torsten Curdt <tc...@vafer.org>.
> >    TarArchiveEntry("C:\foo\bar") -> "/foo/bar"
> >    TarArchiveEntry("D:\foo\bar") -> "/foo/bar"
>
> "strip the drive letter" predates preserveLeadingSlashes
>

Might be but I guess the only question is how to fix it without breaking
too many things.
I am not such a big fan of "preserveLeadingSlashes" either.


I could be convinced that we should only strip drive letters when we
> don't preserve leading slashes either :-)
>

IMO there is just no case at all when compress should strip drive letters.
Sure it might be convenient (for some) but this is too much magic.


Usually tar archives do not contain absolute path names at all.
>

Which is also why the current constructors have a huge potential of
ambiguity and misinterpretation IMO.

  TarArchiveEntry(File file)
  TarArchiveEntry(File file, String fileName)
  TarArchiveEntry(String name)
  TarArchiveEntry(String name, boolean preserveLeadingSlashes)
  TarArchiveEntry(String name, byte linkFlag)
  TarArchiveEntry(String name, byte linkFlag, boolean
preserveLeadingSlashes)

If it should be relative and you pass a File - relative to what?
If I pass in a string there should be no need for "preserveLeadingSlashes"
it should be just that string.

Just some random thoughts.

Maybe worth cleaning up that part of the API.


> Is that in line with how other unix tool ports handle things on windows?
>
> No idea, really.
>

It might be worth checking with Windows users to see what the expected
behaviour might be.
But even then - I still think silently stripping is the wrong thing to do.

cheers,
Torsten

Re: [compress] TarArchiveEntry and Windows drive letter

Posted by Stefan Bodewig <bo...@apache.org>.
On 2018-01-03, Torsten Curdt wrote:

> I just found a new issue with compress.

> It's the "normalizeFileName" in TarArchiveEntry again.
> On Windows it just strips the drive letter

> https://github.com/apache/commons-compress/blob/master/src/
> main/java/org/apache/commons/compress/archivers/tar/
> TarArchiveEntry.java#L1337

> which I think is a questionable default behaviour IMO.

>    TarArchiveEntry("C:\foo\bar") -> "/foo/bar"
>    TarArchiveEntry("D:\foo\bar") -> "/foo/bar"

"strip the drive letter" predates preserveLeadingSlashes

https://github.com/apache/commons-compress/commit/32eea1e611d66a3c1a6b1dc0b373a943614e5680#diff-b7b626a7d92545685d08e93834bfa3d0

triggered by

https://issues.apache.org/jira/browse/COMPRESS-26

and later

https://github.com/apache/commons-compress/commit/3e2ddad124ea6ab1bc690487793bf88560ca3732#diff-b7b626a7d92545685d08e93834bfa3d0

I could be convinced that we should only strip drive letters when we
don't preserve leading slashes either :-)

Usually tar archives do not contain absolute path names at all.

> Is that in line with how other unix tool ports handle things on windows?

No idea, really.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] TarArchiveEntry and Windows drive letter

Posted by Gary Gregory <ga...@gmail.com>.
I do not see how skipping the drive letter on Windows is not normalizing,
it is _moving_.... sounds like a bug to me.

Gary

On Wed, Jan 3, 2018 at 8:46 AM, Torsten Curdt <tc...@vafer.org> wrote:

> I just found a new issue with compress.
>
> It's the "normalizeFileName" in TarArchiveEntry again.
> On Windows it just strips the drive letter
>
> https://github.com/apache/commons-compress/blob/master/src/
> main/java/org/apache/commons/compress/archivers/tar/
> TarArchiveEntry.java#L1337
>
> which I think is a questionable default behaviour IMO.
>
>    TarArchiveEntry("C:\foo\bar") -> "/foo/bar"
>    TarArchiveEntry("D:\foo\bar") -> "/foo/bar"
>
> Is that in line with how other unix tool ports handle things on windows?
>
> cheers,
> Torsten
>
> PS: Just for you sebb ;)
>