You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by "Lee Skillen (JIRA)" <ji...@apache.org> on 2014/07/31 15:39:39 UTC

[jira] [Commented] (PYLUCENE-31) JCC Parallel/Multiprocess Compilation + Caching

    [ https://issues.apache.org/jira/browse/PYLUCENE-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080873#comment-14080873 ] 

Lee Skillen commented on PYLUCENE-31:
-------------------------------------

Andi - Did you (or anyone else) get a chance to review/try this?  Maybe it's a little too experimental, but thoughts appreciated. :-)

> JCC Parallel/Multiprocess Compilation + Caching
> -----------------------------------------------
>
>                 Key: PYLUCENE-31
>                 URL: https://issues.apache.org/jira/browse/PYLUCENE-31
>             Project: PyLucene
>          Issue Type: Improvement
>         Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
>            Reporter: Lee Skillen
>            Priority: Minor
>              Labels: build, cache, ccache, distutils, jcc, parallel
>         Attachments: feature-parallel-build.patch
>
>
> JCC utilises distutils.Extension() in order to build JCC itself and the packages that it generates for Java wrapping - Unfortunately distutils performs its build sequentially and doesn't take advantage of any additional free cores for parallel building.  As discussed on the list this is likely a design decision due to potential issues that may arise when building projects with awkward, cyclic or recursive dependencies.
> These issues shouldn't appear within JCC-based projects because of the generative nature of the build; i.e. all dependencies are resolved and generated prior to building, and the build process itself is about compilation and construction of the wrapper alone, of which the wrapper files are contained to a sequence of flattened compilation units.
> Enabling this requires monkey patching of distutils, which was also discussed on the list as being a potential source of issues, although we feel that the risk is likely lower than the current setuptools patching utilised.  This would be optional functionality that is also only enabled if the monkey-patching succeeds.  Distutils itself is also part of the standard library and might be less susceptible to change than setuptools, and the area of code monkey patched almost hasn't changed since 2002 (see: 
> http://hg.python.org/cpython/file/tip/Lib/distutils/ccompiler.py).
> In addition to the distutils changes this patch also includes changes to the wrapper class generation to make it more cache friendly, with the target being that no changes in the wrapped code equals no changes in the wrapper code.  So any changes that minimally change the wrapped code mean that with a tool such as ccache the rebuild time would be significantly reduced (almost to a nth, where n is the number of files and only one has changed).
> Obviously the maintainers would have to assess this risk and decide whether they would like to accept the patch or not.  Code has only been tested on Linux with Python 2.7.5 but should gracefully fail and prevent parallelisation if one of the requirements hasn't been met (not on linux, no multiprocessing support, or monkey patching somehow fails).  The change to caching should still benefit everyone regardless.
> Please note that an additional dependency on orderedset has been added to achieve the more deterministic ordering - This may not be desirable (i.e. another package might be desired, such as ordered-set, or the code might be inlined into the package instead), as per maintainer comments.
> --- [following repeated from mailing list] ---
> Performance Statistics :-
> The following are some quick and dirty statistics for building the jcc pylucene itself (incl. java lucene which accounts for about 30-ish seconds upfront) - The JCC files are split using --files 8, and each build is preceded with a make clean:
> Serial (unpatched):
> real    5m1.502s
> user    5m22.887s
> sys     0m7.749s
> Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs):
> real    1m37.382s
> user    7m16.658s
> sys     0m8.697s
> Furthermore, some additional changes were made to the wrapped file generation to make the generated code more ccache friendly (additional deterministic sorting for methods and some usage of an ordered set).  With these in place and the CC and CCACHE_COMPILERCHECK environment variables set to "ccache gcc" and "content" respectively, and ensuring ccache is installed, subsequent compilation time is reduced again as follows:
> Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs, ccache enabled):
> real    0m43.051s
> user    1m10.392s
> sys     0m4.547s
> This was a run in which nothing changed between runs, so a realistic run in which changes occur it'll be a figure between 0m43.051s and 1m37.382s, depending on how drastic the change was. If many changes are expected and you want to keep it more cache friendly then using a higher --files would probably work (to an extent), or ideally use --files separate, although it doesn't currently work for me (need to investigate).
> We're mostly utilising the PyLucene build as a test bed since it is repeatable for others, rather than just showing numbers for own application compilations; we also use it to run the unit test suite after changes to JCC itself to ensure it still works as intended for PyLucene.  For illustrative purposes though our application takes 1m53s to compile with JCC from scratch serially, 0m31s in parallel (8 jobs), 0m14s in parallel with ccache enabled and minimal changes, and 0m8s with ccache and no changes.  A very agreeable result!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: [jira] [Commented] (PYLUCENE-31) JCC Parallel/Multiprocess Compilation + Caching

Posted by Andi Vajda <va...@apache.org>.
No, no review yet. My thoughts are the same as last time - maintaining a monkeypatch of distutils is a bit scary. But I need to take a closer look first.

Andi..

> On Jul 31, 2014, at 15:39, "Lee Skillen (JIRA)" <ji...@apache.org> wrote:
> 
> 
>    [ https://issues.apache.org/jira/browse/PYLUCENE-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080873#comment-14080873 ] 
> 
> Lee Skillen commented on PYLUCENE-31:
> -------------------------------------
> 
> Andi - Did you (or anyone else) get a chance to review/try this?  Maybe it's a little too experimental, but thoughts appreciated. :-)
> 
>> JCC Parallel/Multiprocess Compilation + Caching
>> -----------------------------------------------
>> 
>>                Key: PYLUCENE-31
>>                URL: https://issues.apache.org/jira/browse/PYLUCENE-31
>>            Project: PyLucene
>>         Issue Type: Improvement
>>        Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
>>           Reporter: Lee Skillen
>>           Priority: Minor
>>             Labels: build, cache, ccache, distutils, jcc, parallel
>>        Attachments: feature-parallel-build.patch
>> 
>> 
>> JCC utilises distutils.Extension() in order to build JCC itself and the packages that it generates for Java wrapping - Unfortunately distutils performs its build sequentially and doesn't take advantage of any additional free cores for parallel building.  As discussed on the list this is likely a design decision due to potential issues that may arise when building projects with awkward, cyclic or recursive dependencies.
>> These issues shouldn't appear within JCC-based projects because of the generative nature of the build; i.e. all dependencies are resolved and generated prior to building, and the build process itself is about compilation and construction of the wrapper alone, of which the wrapper files are contained to a sequence of flattened compilation units.
>> Enabling this requires monkey patching of distutils, which was also discussed on the list as being a potential source of issues, although we feel that the risk is likely lower than the current setuptools patching utilised.  This would be optional functionality that is also only enabled if the monkey-patching succeeds.  Distutils itself is also part of the standard library and might be less susceptible to change than setuptools, and the area of code monkey patched almost hasn't changed since 2002 (see: 
>> http://hg.python.org/cpython/file/tip/Lib/distutils/ccompiler.py).
>> In addition to the distutils changes this patch also includes changes to the wrapper class generation to make it more cache friendly, with the target being that no changes in the wrapped code equals no changes in the wrapper code.  So any changes that minimally change the wrapped code mean that with a tool such as ccache the rebuild time would be significantly reduced (almost to a nth, where n is the number of files and only one has changed).
>> Obviously the maintainers would have to assess this risk and decide whether they would like to accept the patch or not.  Code has only been tested on Linux with Python 2.7.5 but should gracefully fail and prevent parallelisation if one of the requirements hasn't been met (not on linux, no multiprocessing support, or monkey patching somehow fails).  The change to caching should still benefit everyone regardless.
>> Please note that an additional dependency on orderedset has been added to achieve the more deterministic ordering - This may not be desirable (i.e. another package might be desired, such as ordered-set, or the code might be inlined into the package instead), as per maintainer comments.
>> --- [following repeated from mailing list] ---
>> Performance Statistics :-
>> The following are some quick and dirty statistics for building the jcc pylucene itself (incl. java lucene which accounts for about 30-ish seconds upfront) - The JCC files are split using --files 8, and each build is preceded with a make clean:
>> Serial (unpatched):
>> real    5m1.502s
>> user    5m22.887s
>> sys     0m7.749s
>> Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs):
>> real    1m37.382s
>> user    7m16.658s
>> sys     0m8.697s
>> Furthermore, some additional changes were made to the wrapped file generation to make the generated code more ccache friendly (additional deterministic sorting for methods and some usage of an ordered set).  With these in place and the CC and CCACHE_COMPILERCHECK environment variables set to "ccache gcc" and "content" respectively, and ensuring ccache is installed, subsequent compilation time is reduced again as follows:
>> Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs, ccache enabled):
>> real    0m43.051s
>> user    1m10.392s
>> sys     0m4.547s
>> This was a run in which nothing changed between runs, so a realistic run in which changes occur it'll be a figure between 0m43.051s and 1m37.382s, depending on how drastic the change was. If many changes are expected and you want to keep it more cache friendly then using a higher --files would probably work (to an extent), or ideally use --files separate, although it doesn't currently work for me (need to investigate).
>> We're mostly utilising the PyLucene build as a test bed since it is repeatable for others, rather than just showing numbers for own application compilations; we also use it to run the unit test suite after changes to JCC itself to ensure it still works as intended for PyLucene.  For illustrative purposes though our application takes 1m53s to compile with JCC from scratch serially, 0m31s in parallel (8 jobs), 0m14s in parallel with ccache enabled and minimal changes, and 0m8s with ccache and no changes.  A very agreeable result!
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)