You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by Lee Skillen <ls...@vulcanft.com> on 2014/07/14 19:01:39 UTC

JCC Parallel/Multiprocess Compilation + Caching

Hi,

We've been utilising JCC frequently recently during development and
often have a need to recompile our JCC-based extensions due to changes
in the wrapped code - Doing so incurs a reasonably lengthy
re-compilation effort due to the serial nature of the extension
building via distutils.

To help with this we currently have a potential patch that would add
parallel building support to JCC within Linux (based partially off
public domain code for distutils, via monkey patching.)  Would this be
something that might be of interest to be integrated into the mainline
trunk?

The following are some quick and dirty statistics for building the jcc
pylucene itself (incl. java lucene which accounts for about 30-ish
seconds upfront) - The JCC files are split using --files 8, and each
build is preceded with a make clean:

Serial (unpatched):

real    5m1.502s
user    5m22.887s
sys     0m7.749s

Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs):

real    1m37.382s
user    7m16.658s
sys     0m8.697s

Furthermore, some additional changes were made to the wrapped file
generation to make the generated code more ccache friendly (additional
deterministic sorting for methods and some usage of an ordered set).
With these in place and the CC and CCACHE_COMPILERCHECK environment
variables set to "ccache gcc" and "content" respectively, and ensuring
ccache is installed, subsequent compilation time is reduced again as
follows:

Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs,
ccache enabled):

real    0m43.051s
user    1m10.392s
sys     0m4.547s

This was a run in which nothing changed between runs, so a realistic
run in which changes occur it'll be a figure between 0m43.051s and
1m37.382s, depending on how drastic the change was. If many changes
are expected and you want to keep it more cache friendly then using a
higher --files would probably work (to an extent), or ideally use
--files separate, although it doesn't currently work for me (need to
investigate).

Any thoughts appreciated.

Cheers,
Lee

-- 
Lee Skillen

Vulcan Financial Technologies
1st Floor, 47 Malone Road, Belfast, BT9 6RY

Office:  +44 (0)28 95 817888
Web:     www.vulcanft.com

Re: JCC Parallel/Multiprocess Compilation + Caching

Posted by Lee Skillen <ls...@vulcanft.com>.
On 14 July 2014 23:19, Andi Vajda <va...@apache.org> wrote:
>
>  Hi Lee,
>
>> On Jul 14, 2014, at 19:01, Lee Skillen <ls...@vulcanft.com> wrote:
>>
>> Hi,
>>
>> We've been utilising JCC frequently recently during development and
>> often have a need to recompile our JCC-based extensions due to changes
>> in the wrapped code - Doing so incurs a reasonably lengthy
>> re-compilation effort due to the serial nature of the extension
>> building via distutils.
>>
>> To help with this we currently have a potential patch that would add
>> parallel building support to JCC within Linux (based partially off
>> public domain code for distutils, via monkey patching.)  Would this be
>> something that might be of interest to be integrated into the mainline
>> trunk?
>>
>> The following are some quick and dirty statistics for building the jcc
>> pylucene itself (incl. java lucene which accounts for about 30-ish
>> seconds upfront) - The JCC files are split using --files 8, and each
>> build is preceded with a make clean:
>>
>> Serial (unpatched):
>>
>> real    5m1.502s
>> user    5m22.887s
>> sys     0m7.749s
>>
>> Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs):
>>
>> real    1m37.382s
>> user    7m16.658s
>> sys     0m8.697s
>>
>> Furthermore, some additional changes were made to the wrapped file
>> generation to make the generated code more ccache friendly (additional
>> deterministic sorting for methods and some usage of an ordered set).
>> With these in place and the CC and CCACHE_COMPILERCHECK environment
>> variables set to "ccache gcc" and "content" respectively, and ensuring
>> ccache is installed, subsequent compilation time is reduced again as
>> follows:
>>
>> Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs,
>> ccache enabled):
>>
>> real    0m43.051s
>> user    1m10.392s
>> sys     0m4.547s
>>
>> This was a run in which nothing changed between runs, so a realistic
>> run in which changes occur it'll be a figure between 0m43.051s and
>> 1m37.382s, depending on how drastic the change was. If many changes
>> are expected and you want to keep it more cache friendly then using a
>> higher --files would probably work (to an extent), or ideally use
>> --files separate, although it doesn't currently work for me (need to
>> investigate).
>>
>> Any thoughts appreciated.
>
> This is a pretty cool feature !
> The scary part is depending on monkey patching distutils...
> When that breaks, it's a pain to maintain: I've been having to make my workaround for now years-old setuptools issue-43 via monkey patching more complicated as more and more branches and versions of setuptools have appeared over time.

Absolutely, I agree that these kind-of things can be a nightmare and I
seen that setuptools is already being patched for shared build
capability!

The main solace here is that this would be optional functionality that
is only enabled if the monkey-patching succeeds for the current
platform and the --jobs parameter is greater than 1 (or set to 'N' to
specify all available cores).  It may also help that distutils itself
is part of the standard library so may be less susceptible to change
than setuptools - Looking at the history for ccompiler.py
(http://hg.python.org/cpython/file/tip/Lib/distutils/ccompiler.py),
the body of CCompiler.compile() almost hasn't changed since 2002 and
the signature hasn't changed at all.

>
> Any chance these distutils patches could be merged into distutils upstream first ?

Unfortunately it might be difficult to get it accepted upstream - I
think mainly because for other modules that utilise
distutils.Extension() there may be issues with dependency race
conditions when parallelising builds (in a similar manner to
recursively building projects via make with parallel jobs active).
This isn't an issue with JCC builds since there is no recursion or odd
dependency issues, everything is built into one place, and all
generation takes place upfront.

> If you're using JCC only, there is no need to build PyLucene but I assume you know that already. Building JCC by itself is pretty fast.

Aye - We're mostly utilising the PyLucene build as a test bed since it
is repeatable for others, rather than just showing numbers for own
application compilations; we also use it to run the unit test suite
after changes to JCC itself to ensure it still works as intended for
PyLucene.  For illustrative purposes though our application takes
1m53s to compile with JCC from scratch serially, 0m31s in parallel (8
jobs), 0m14s in parallel with ccache enabled and minimal changes, and
0m8s with ccache and no changes.  A very agreeable result!

I realise this one is a bit more risky, but we're happy to tidy up the
patch and submit it anyway if the interest is there, and we'll leave
it in your hands to decide. :-)

Cheers,
Lee

> Andi..
>
>>
>> Cheers,
>> Lee
>>
>> --
>> Lee Skillen
>>
>> Vulcan Financial Technologies
>> 1st Floor, 47 Malone Road, Belfast, BT9 6RY
>>
>> Office:  +44 (0)28 95 817888
>> Web:     www.vulcanft.com



-- 
Lee Skillen

Vulcan Financial Technologies
1st Floor, 47 Malone Road, Belfast, BT9 6RY

Office:  +44 (0)28 95 817888
Web:     www.vulcanft.com

Re: JCC Parallel/Multiprocess Compilation + Caching

Posted by Andi Vajda <va...@apache.org>.
 Hi Lee,

> On Jul 14, 2014, at 19:01, Lee Skillen <ls...@vulcanft.com> wrote:
> 
> Hi,
> 
> We've been utilising JCC frequently recently during development and
> often have a need to recompile our JCC-based extensions due to changes
> in the wrapped code - Doing so incurs a reasonably lengthy
> re-compilation effort due to the serial nature of the extension
> building via distutils.
> 
> To help with this we currently have a potential patch that would add
> parallel building support to JCC within Linux (based partially off
> public domain code for distutils, via monkey patching.)  Would this be
> something that might be of interest to be integrated into the mainline
> trunk?
> 
> The following are some quick and dirty statistics for building the jcc
> pylucene itself (incl. java lucene which accounts for about 30-ish
> seconds upfront) - The JCC files are split using --files 8, and each
> build is preceded with a make clean:
> 
> Serial (unpatched):
> 
> real    5m1.502s
> user    5m22.887s
> sys     0m7.749s
> 
> Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs):
> 
> real    1m37.382s
> user    7m16.658s
> sys     0m8.697s
> 
> Furthermore, some additional changes were made to the wrapped file
> generation to make the generated code more ccache friendly (additional
> deterministic sorting for methods and some usage of an ordered set).
> With these in place and the CC and CCACHE_COMPILERCHECK environment
> variables set to "ccache gcc" and "content" respectively, and ensuring
> ccache is installed, subsequent compilation time is reduced again as
> follows:
> 
> Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs,
> ccache enabled):
> 
> real    0m43.051s
> user    1m10.392s
> sys     0m4.547s
> 
> This was a run in which nothing changed between runs, so a realistic
> run in which changes occur it'll be a figure between 0m43.051s and
> 1m37.382s, depending on how drastic the change was. If many changes
> are expected and you want to keep it more cache friendly then using a
> higher --files would probably work (to an extent), or ideally use
> --files separate, although it doesn't currently work for me (need to
> investigate).
> 
> Any thoughts appreciated.

This is a pretty cool feature !
The scary part is depending on monkey patching distutils... 
When that breaks, it's a pain to maintain: I've been having to make my workaround for now years-old setuptools issue-43 via monkey patching more complicated as more and more branches and versions of setuptools have appeared over time.

Any chance these distutils patches could be merged into distutils upstream first ?

If you're using JCC only, there is no need to build PyLucene but I assume you know that already. Building JCC by itself is pretty fast.

Andi..

> 
> Cheers,
> Lee
> 
> -- 
> Lee Skillen
> 
> Vulcan Financial Technologies
> 1st Floor, 47 Malone Road, Belfast, BT9 6RY
> 
> Office:  +44 (0)28 95 817888
> Web:     www.vulcanft.com