You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@ant.apache.org by bu...@apache.org on 2018/12/09 22:45:05 UTC

[Bug 62995] New: Unzip performance regression on Windows due to BZ 62502

https://bz.apache.org/bugzilla/show_bug.cgi?id=62995

            Bug ID: 62995
           Summary: Unzip performance regression on Windows due to BZ
                    62502
           Product: Ant
           Version: 1.9.13
          Hardware: PC
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Core tasks
          Assignee: notifications@ant.apache.org
          Reporter: f.modler@gmx.net
  Target Milestone: ---

While playing around with maven-antrun-plugin and more recent Ant versions than
1.9.4 that comes with the plugin, I discovered that with 1.9.13 the unzip task
is way slower for larger zipfiles than 1.9.12.

Unfortunately I am not allowed to share the JBoss EAP 6.4 zipfile I was testing
with (a RedHat subscription is required).
The file is around 300 MB and cotains 1340 directories and 1517 files.

I guess the problem is the use of File.getCanoniocalPath() (twice for each file
being extracted), introduced by this change:
https://github.com/apache/ant/commit/6a41d62cb9ab4e640b72cb4de42a6c211dea645d#diff-28908450670f05abc5779fa0c9291510
Respective ticket: https://bz.apache.org/bugzilla/show_bug.cgi?id=62502

One possible simple fix might be to check File.getPath() for ".." and if (and
only if) at least one occurence is found, isLeadingPath() is called.

Envorinment:
Java version: 1.8.0_192, vendor: Oracle Corporation
Default locale: de_DE, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "dos"

PS: Maybe Linux is affected as well, didn't test...

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 62995] Unzip performance regression on Windows due to BZ 62502

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62995

Falko Modler <f....@gmx.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW

--- Comment #3 from Falko Modler <f....@gmx.net> ---
>> Is this target using "unzip" in just a basic manner or is there anything more to it?

It's really just basic unzip.

>> I would expect the JRE implementation to return a cached value

Ok, maybe - I don't know. I just had a quick look at the code and did some
research on the method and found various blog entries etc. that contained a
warning that this method should be used sparingly.
If there is some caching, it must reside in native code. Another question is:
Does it cache sub-paths? Each file from a zip (normally) has an unique path
string...

>> Can you post the timings though, with 1.9.12 and 1.9.13?

I ran some tests and while there *is* a noticeable/distinct delay, it is *not*
"threefold" as I stated initially. I am not sure where/how I saw this massive
delay.

1.9.12 allowFilesToEscapeDest=false: ~8s

1.9.12 allowFilesToEscapeDest=true : ~8s

1.9.13 allowFilesToEscapeDest=false: ~10s

1.9.13 allowFilesToEscapeDest=true : ~8s

In the end this leaves us with a ~25% penalty.

PS: As I don't use standalone Ant, the tests were conducted with:
- maven-anrtun-plugin (with updated ant dependency)
- Maven 3.3.9
- Java version: 1.8.0_192, vendor: Oracle Corporation
- Default locale: de_DE, platform encoding: Cp1252
- OS name: "windows 10", version: "10.0", arch: "amd64", family: "dos"

Hardware:
- Thinkpad P51
- Intel Core i7-7820HQ
- 64GB RAM
- Bitlocker-enabled NTFS partition on a Samsung SSD 960 Pro 1TB
- Virus Scanner (McAfee) is *not* active for the involved files/folders

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 62995] Unzip performance regression on Windows due to BZ 62502

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62995

Jaikiran Pai <ja...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All

--- Comment #1 from Jaikiran Pai <ja...@apache.org> ---
Hello Falko,

Is this target using "unzip" in just a basic manner or is there anything more
to it? If possible, can you attach that target.

>> I guess the problem is the use of File.getCanoniocalPath() (twice for each file being extracted)

Looking at the code in question (the one which calls the
FileUtils.isLeadingPath()), the "dir" will always be same destination
directory. So even if we end up calling getCanonicalPath twice, I would expect
the JRE implementation to return a cached value (as far as I remember, 1.8.x
latest version of Java still uses canonical path cache). So I think, it
shouldn't be that expensive (relatively). However, I'm not stating that this
isn't the cause and you probably are right that this change ended up being
expensive.

>> Unfortunately I am not allowed to share the JBoss EAP 6.4 zipfile I was testing with (a RedHat subscription is required).
The file is around 300 MB and cotains 1340 directories and 1517 files.

Can you post the timings though, with 1.9.12 and 1.9.13? That will give us some
idea on what kind of performance regression we are seeing. In the meantime,
I'll see if I can reproduce this on a *nix with a similar large file.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 62995] Unzip performance regression on Windows due to BZ 62502

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62995

Jaikiran Pai <ja...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #2 from Jaikiran Pai <ja...@apache.org> ---
FWIW - I gave it a try with the community version of JBoss EAP, the WildFly zip
file (which is around 172MB and somewhat similar to the EAP zip contents) but I
couldn't notice any difference is timing performance between latest and older
versions of Ant. I am on a *nix system. It would be good to have a build file
or other additional data to reproduce this.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 62995] Unzip performance regression on Windows due to BZ 62502

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62995

Itan.riza <Mr...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |Mriza2467@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.