You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2009/10/15 07:20:33 UTC

[jira] Created: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Speed up ivy resolution in builds with clever caching
-----------------------------------------------------

                 Key: MAPREDUCE-1114
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: build
    Affects Versions: 0.22.0
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon
            Priority: Minor


An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786234#action_12786234 ] 

Todd Lipcon commented on MAPREDUCE-1114:
----------------------------------------

Doug: the slowness is actually in the resolve task which generates the various classpath properties in ant. Without caching those properties to disk, there's no way to get around running ivy that I can think of. This patch essentially persists them to disk between runs, since the majority of the time they don't change.

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated MAPREDUCE-1114:
-----------------------------------

    Attachment: mapreduce-1114.txt

Attaching up to date patch.

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786265#action_12786265 ] 

Chris Douglas commented on MAPREDUCE-1114:
------------------------------------------

bq. Comparing the 15 second payoff to the full build time isn't particular important to me. For me, the ability to quickly iterate on code while recompiling and rerunning unit tests is the big payoff

As a vi user, I got that. I haven't argued that the long build times are unimportant, but that a hack introducing a custom caching layer for classpaths is not, in my mind, a justifiable tradeoff in complexity. Maintaining black magic in the build is tedious and avoidable.

bq. the slowness is actually in the resolve task which generates the various classpath properties in ant

Aren't the classpaths named? Would there be a way to short-circuit the resolution if it created/checked for a file mapped to that path?

bq. My most common development cycle is to run a single unit test. For Avro this takes just a few seconds, and I'm willing to wait without finding a new task to work on.

As a workaround: depending on how often I'm running it, adding a {{main}} to the unit test is sometimes worthwhile.

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786211#action_12786211 ] 

Todd Lipcon commented on MAPREDUCE-1114:
----------------------------------------

bq. I don't think the 15 second payoff justifies the maintenance cost of a custom caching layer for ivy.

Comparing the 15 second payoff to the full build time isn't particular important to me. For me, the ability to quickly iterate on code while recompiling and rerunning unit tests is the big payoff - so I look at this as a 60% speedup in my development cycle rather than a few % speedup in the full build.

I may be in the minority, though, as I don't use eclipse or anything other fancy IDE that does incremental compilation.

Anyone else care to chime in?

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated MAPREDUCE-1114:
-----------------------------------

    Attachment: mapreduce-1114.txt

Attaching patch which speeds up a contrib-skipping null test case run from 32 seconds to 15 seconds wallclock.

{noformat}
todd@todd-laptop:~/git/hadoop-mapreduce$ time ant test -Dskip.contrib=1 -Dtestcase=xxxx > /dev/null

real    0m11.360s
user    0m15.897s
sys     0m1.436s
todd@todd-laptop:~/git/hadoop-mapreduce$ git reset --hard 'HEAD^'
HEAD is now at 8ac54e1 MAPREDUCE-1113. Fix mumak to not compile aspects with skip.contrib is set
todd@todd-laptop:~/git/hadoop-mapreduce$ time ant test -Dskip.contrib=1 -Dtestcase=xxxx > /dev/null

real    0m25.222s
user    0m32.870s
sys     0m1.948s
{noformat}

There's still a bit of room to improve here - SLF4J is missing some uptodate check which cause an unnecessary javac pass.

The way this works is that the macro writes the resolved classpath into build/ivy/ivy-resolve-cache/<project>-<conf> if it doesn't already exist. If that does exist, it loads that file into the appropriate properties as if ivy had done the resolution anew.

This patch is not final quality - it outputs some debugging printouts and could be cleaned up a bit, just wanted to see what people thought before spending the time to make it prettier.

You may wonder why this is starting as a MAPREDUCE patch and not HADOOP - no particular reason. If people like this I will do the same for common and HDFS.

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770161#action_12770161 ] 

Hadoop QA commented on MAPREDUCE-1114:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12423234/mapreduce-1114.txt
  against trunk revision 829529.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The patch appears to cause tar ant target to fail.

    -1 findbugs.  The patch appears to cause Findbugs to fail.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/213/testReport/
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/213/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/213/console

This message is automatically generated.

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765967#action_12765967 ] 

Todd Lipcon commented on MAPREDUCE-1114:
----------------------------------------

bq. SLF4J is missing some uptodate check which cause an unnecessary javac pass.

Sorry, I misread some output there. The uptodate check is due to avro-generate regenerating its .java output for every build. I'll attack that in a later JIRA next time I get aggravated by slow builds ;-)

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786268#action_12786268 ] 

Todd Lipcon commented on MAPREDUCE-1114:
----------------------------------------

bq. Aren't the classpaths named? Would there be a way to short-circuit the resolution if it created/checked for a file mapped to that path?

That is exactly what this patch does...

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766121#action_12766121 ] 

Doug Cutting commented on MAPREDUCE-1114:
-----------------------------------------

> The uptodate check is due to avro-generate regenerating its .java output for every build.

I filed AVRO-150 for this.


> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792749#action_12792749 ] 

Chris Douglas commented on MAPREDUCE-1114:
------------------------------------------

bq. Some of these are within the same ant run, so they get cached. But 16 of them actually do some non-cached work [...]

If I understand you correctly, the punchline is that improvements to intra-build caching are not only tedious, but also not a sound way of reducing the build time, as most classpaths are independently defined. So without fundamentally changing how dependencies are resolved, attacking the problem as in the current patch is the only way to effect a meaningful reduction. Is that the argument?

bq. fixing ivy doesn't make much sense - we'd be better off focusing on moving towards Maven.

Is there a JIRA tracking progress in removing ivy? If it's not happening in the near term, then something like the current patch may be worth keeping in trunk during interim 0.22 development.

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792687#action_12792687 ] 

Konstantin Boudnik commented on MAPREDUCE-1114:
-----------------------------------------------

Well, build.xml has 7 'retrieves' in it. If you all add contrib to this it's gonna be total mess (e.g. 22 re-resolutions). IMO fixing ivy doesn't make much sense - we'd be better off focusing on moving towards Maven.

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792667#action_12792667 ] 

Todd Lipcon commented on MAPREDUCE-1114:
----------------------------------------

The issue is that the build ends up spawning a lot of subants, which each re-resolve everything. I get a total of 22 ivy-resolves even if I skip contrib! Part of this is that skip.contrib=1 still resolves all of the contrib stuff (MAPREDUCE-1113)

{quote}
todd@todd-laptop:~/git/hadoop-mapreduce$ ant test -Dskip.contrib=1 -Dtestcase=xxx 2>&1 | grep 'ivy-resolve' | wc -l
22
todd@todd-laptop:~/git/hadoop-mapreduce$ ant test -Dskip.contrib=1 -Dtestcase=xxx 2>&1 | grep 'ivy-resolve-common' | wc -l
19
{quote}

Some of these are within the same ant run, so they get cached. But 16 of them actually do some non-cached work:
{quote}
todd@todd-laptop:~/git/hadoop-mapreduce$ ant test -Dskip.contrib=1 -Dtestcase=xxx 2>&1 | grep 'resolving dependencies' | wc -l
16
{quote}

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated MAPREDUCE-1114:
-----------------------------------

    Status: Open  (was: Patch Available)

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated MAPREDUCE-1114:
-------------------------------------

    Status: Open  (was: Patch Available)

The patch is stale.

The long build times are a problem and ivy's a big part of that, but I agree with your assessment: this is a hack. I don't think the 15 second payoff justifies the maintenance cost of a custom caching layer for ivy.

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786309#action_12786309 ] 

Todd Lipcon commented on MAPREDUCE-1114:
----------------------------------------

When the classpath is resolved, it's written out to a text file named for that variable. Then when it needs to be resolved again, if that file exists, it's loaded rather than re-resolving.

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated MAPREDUCE-1114:
-----------------------------------

    Attachment: mapreduce-1114.txt

Forgot to include build-macros.xml in previous patch

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786342#action_12786342 ] 

Todd Lipcon commented on MAPREDUCE-1114:
----------------------------------------

Ivy already caches the resolves done in the same run, in theory, but there are a lot of "different" resolves, I think? The gain here *is* from caching between runs as you surmised.

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786225#action_12786225 ] 

Doug Cutting commented on MAPREDUCE-1114:
-----------------------------------------

> I look at this as a 60% speedup in my development cycle rather than a few % speedup in the full build.

I agree with this logic.  My most common development cycle is to run a single unit test.  For Avro this takes just a few seconds, and I'm willing to wait without finding a new task to work on.  With Hadoop this takes long enough that I switch to doing something else, lose my context, etc.  Improving this significantly will significantly improve many developers productivity.

I wonder if we can simply check if build/ivy/lib/Hadoop-Hdfs/{common,test} exist, and, if they do, assumes they're up-to-date, and only runs Ivy otherwise.  Might that be simpler?


> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792750#action_12792750 ] 

Todd Lipcon commented on MAPREDUCE-1114:
----------------------------------------

bq. So without fundamentally changing how dependencies are resolved, attacking the problem as in the current patch is the only way to effect a meaningful reduction

Yes, as far as I'm aware, that's the case. Thanks for the concise way of explaining it.

To be completely honest, I'm nowhere near an expert in ant/ivy/maven/etc. I'm just a developer who doesn't use an IDE, and waiting 30-40 seconds every time I need to rerun a testcase got aggravating ;-)

bq. Is there a JIRA tracking progress in removing ivy?

I'm not aware of any such.

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated MAPREDUCE-1114:
-----------------------------------

    Status: Patch Available  (was: Open)

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated MAPREDUCE-1114:
-----------------------------------

    Status: Patch Available  (was: Open)

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792797#action_12792797 ] 

Konstantin Boudnik commented on MAPREDUCE-1114:
-----------------------------------------------

bq. Is there a JIRA tracking progress in removing ivy? If it's not happening in the near term, then something like the current patch may be worth keeping in trunk during interim 0.22 development.

I know that folks here and there are eager to move to Maven, but I don't know how fast it might actually happen. This said I'm completely fine with having such short term solution in place.

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786305#action_12786305 ] 

Chris Douglas commented on MAPREDUCE-1114:
------------------------------------------

Then I'm missing something. What is being "cached"?

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786322#action_12786322 ] 

Chris Douglas commented on MAPREDUCE-1114:
------------------------------------------

I thought the bulk of the problem was re-resolving these properties during the same run. Is that mistaken? The current proposal also works across runs, which could be helpful, but again: maintaining the build is already a pain. Adding a cache to a bad idea is a well established software engineering practice, but I'd favor either fixing our use of ivy or replacing it if middling performance requires this.

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788462#action_12788462 ] 

Konstantin Boudnik commented on MAPREDUCE-1114:
-----------------------------------------------

May be I'm barking on a wrong tree but I've tried to run a couple of commands in current MR trunk:
{noformat}
 % time ant ivy-resolve-common
Buildfile: build.xml
...
BUILD SUCCESSFUL
Total time: 3 seconds

real    0m4.513s
user    0m5.186s
sys     0m0.616s
{noformat}

I got very close result for {{% time ant ivy-resolve-tree}}

so it seems to me that resolver works pretty damn fast considering the number of artifacts it needs to check. May be the latency is hiding somewhere else?

> Speed up ivy resolution in builds with clever caching
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-1114
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt
>
>
> An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly "ant test") significantly using some ant macros to cache the resolved classpaths.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.