You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Shai Erera <se...@gmail.com> on 2010/04/01 06:03:49 UTC

Parallel tests in Benchmark

Hi

I'd like to summarize a discussion I had w/ Robert and Mike last night on
IRC, about the parallelism of tasks in Benchmark:

For some reason, ever since parallel tasks were introduced, when I run 'ant
test' from the contrib/benchmark folder (or the root), the tests just hang
at some point, after WriteLineDocTaskTest finishes. What's very weird is
that it seems I'm the only one experiencing this, and so for a long time I
thought it's just a problem w/ my environment ... until yesterday when I did
a fresh checkout of trunk, to a fresh folder and project, and still the
tests stuck.

Thread dump does not show anything relevant to Lucene code, but rather to
Ant. The main thread is waiting on
org/apache/tools/ant/taskdefs/Parallel.spinThreads, another on
org/apache/tools/ant/taskdefs/Execute.waitFor and two other on
java/io/FileInputStream.read. But nothing is related to Lucene code,
directly. Also annoyingly, but conveniently for debugging that issue, it
happens very consistently on my machine - sometimes the test passes, but 90%
hangs.
Running w/ -Drunsequential=1 consistently succeeds.

We've explored different ways to understand the cause of the problem, and
came across several improvements and a workaround, but unfortunately not to
a definite resolution:

* As a last resort, we can add runsequential property to benchmark
build.xml, which forces Benchmark tests to run sequentially. Since that's a
tiny package which takes a few seconds to run anyway, and parallelism
doesn't improve much (it actually runs slower, when it passes, on my
machine: parallel=15 sec, seq=11 sec), this might be acceptable.

* Moving the junit temp files (such as that flag file) created to the temp
directory each test uses. This is actually a good thing to do anyway (thanks
Robert for spotting that), because it avoids accidental commits of such
files :), as well as doesn't clutter the main environment. We've done that
because when I hit CTR:+C to stop one of the runs which hung, we received a
FNFE on a junit flag "file is being accessed by another process" (something
like that), and thought this is related to the hangs I'm seeing. Anyway,
this file is attempted access by multiple JVMs concurrently, which seems
bad.

* Explore the JUnit Formatter code under src/test, since it uses file
locking. I've disabled locks (using NoLockFactory), however the test still
hung.

* Change common-build.xml threadsPerProcessor to '1' instead of '2'. We
think that might be a good thing to do anyway - if people run on machines
with just one CPU, threading is not expected to help much, as opposed to
running on multiple CPUs. But we don't want to enforce it on anyone, so we
think to change the default to '1', but introduce a property
'threadsPerProcessor' which users will be able to set explicitly.
** Surprisingly, when I set it to '1' or '10' (I run on dual-core Thinkpad
W500), the test consistently passes - it just doesn't like the value '2'. At
least it passed as long as I ran it, maybe a thread hang is lurking for me
around the corner somewhere.

* We made sure the benchmark tests indeed read/write the test data files
from/to unique directories. But like I said - there is no hang in Lucene
code reported in the thread dump.

It was very late last night when we stopped, and my eyes were tired, so I
didn't summarize it right away. Robert, I hope I've captured everything we
did, if not please add.

Anyone's got any suggestions? It's unfortunate that I'm the only one running
into this problem, because whatever the suggestions are, you'll probably
need me to confirm them :). And I'm going away for 3 days (camping - no
internet ... well at least no laptop :)), so unless someone has a suggestion
within the coming few hours, we can continue that when I get back.

Shai

Re: Parallel tests in Benchmark

Posted by Shai Erera <se...@gmail.com>.

Ok let's do that (add runsequential to benchmark and all the rest). If
I'll run into this elsewhere as well I will report and we can talk
then about trying to find a solution for this. If it's just benchmark
then I think we'll be ok.

Shai

On Thursday, April 1, 2010, Robert Muir <rc...@gmail.com> wrote:
> On Thu, Apr 1, 2010 at 12:03 AM, Shai Erera <se...@gmail.com> wrote:
>
>
> Hi
>
> I'd like to summarize a discussion I had w/ Robert and Mike last night on IRC, about the parallelism of tasks in Benchmark:
>
> For some reason, ever since parallel tasks were introduced, when I run 'ant test' from the contrib/benchmark folder (or the root), the tests just hang at some point, after WriteLineDocTaskTest finishes. What's very weird is that it seems I'm the only one experiencing this, and so for a long time I thought it's just a problem w/ my environment ... until yesterday when I did a fresh checkout of trunk, to a fresh folder and project, and still the tests stuck.
>
> Thread dump does not show anything relevant to Lucene code, but rather to Ant. The main thread is waiting on org/apache/tools/ant/taskdefs/Parallel.spinThreads, another on org/apache/tools/ant/taskdefs/Execute.waitFor and two other on java/io/FileInputStream.read. But nothing is related to Lucene code, directly. Also annoyingly, but conveniently for debugging that issue, it happens very consistently on my machine - sometimes the test passes, but 90% hangs.
> Running w/ -Drunsequential=1 consistently succeeds.
>
> We've explored different ways to understand the cause of the problem, and came across several improvements and a workaround, but unfortunately not to a definite resolution:
>
> * As a last resort, we can add runsequential property to benchmark build.xml, which forces Benchmark tests to run sequentially. Since that's a tiny package which takes a few seconds to run anyway, and parallelism doesn't improve much (it actually runs slower, when it passes, on my machine: parallel=15 sec, seq=11 sec), this might be acceptable.
>
> * Moving the junit temp files (such as that flag file) created to the temp directory each test uses. This is actually a good thing to do anyway (thanks Robert for spotting that), because it avoids accidental commits of such files :), as well as doesn't clutter the main environment. We've done that because when I hit CTR:+C to stop one of the runs which hung, we received a FNFE on a junit flag "file is being accessed by another process" (something like that), and thought this is related to the hangs I'm seeing. Anyway, this file is attempted access by multiple JVMs concurrently, which seems bad.
>
> * Explore the JUnit Formatter code under src/test, since it uses file locking. I've disabled locks (using NoLockFactory), however the test still hung.
>
> * Change common-build.xml threadsPerProcessor to '1' instead of '2'. We think that might be a good thing to do anyway - if people run on machines with just one CPU, threading is not expected to help much, as opposed to running on multiple CPUs. But we don't want to enforce it on anyone, so we think to change the default to '1', but introduce a property 'threadsPerProcessor' which users will be able to set explicitly.
> ** Surprisingly, when I set it to '1' or '10' (I run on dual-core Thinkpad W500), the test consistently passes - it just doesn't like the value '2'. At least it passed as long as I ran it, maybe a thread hang is lurking for me around the corner somewhere.
>
> * We made sure the benchmark tests indeed read/write the test data files from/to unique directories. But like I said - there is no hang in Lucene code reported in the thread dump.
>
> It was very late last night when we stopped, and my eyes were tired, so I didn't summarize it right away. Robert, I hope I've captured everything we did, if not please add.
>
> Anyone's got any suggestions? It's unfortunate that I'm the only one running into this problem, because whatever the suggestions are, you'll probably need me to confirm them :). And I'm going away for 3 days (camping - no internet ... well at least no laptop :)), so unless someone has a suggestion within the coming few hours, we can continue that when I get back.
>
> Shai
>
>
> I think you got everything. I reopened the JIRA issue too (LUCENE-1709) and listed the things we can do for sure now, such as lowering threadsPerProcessor (and allowing someone to use a system property to override this) and fixing junit temp files to be in the temp directory. Additionally I would like to fix the ant library problem as mentioned there. it works great from the command-line but we should improve this for IDE-users, so they do not see a compile error.
>
> I am personally for the idea of adding the runsequential property to benchmark's build.xml, to force it to run serially. While I am unable to reproduce your problem, it does not surprise me, as I had a tough time trying to prevent benchmark tests from stepping on each others toes.
>
> --
> Robert Muir
> rcmuir@gmail.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Parallel tests in Benchmark

Posted by Robert Muir <rc...@gmail.com>.

On Thu, Apr 1, 2010 at 12:03 AM, Shai Erera <se...@gmail.com> wrote:

> Hi
>
> I'd like to summarize a discussion I had w/ Robert and Mike last night on
> IRC, about the parallelism of tasks in Benchmark:
>
> For some reason, ever since parallel tasks were introduced, when I run 'ant
> test' from the contrib/benchmark folder (or the root), the tests just hang
> at some point, after WriteLineDocTaskTest finishes. What's very weird is
> that it seems I'm the only one experiencing this, and so for a long time I
> thought it's just a problem w/ my environment ... until yesterday when I did
> a fresh checkout of trunk, to a fresh folder and project, and still the
> tests stuck.
>
> Thread dump does not show anything relevant to Lucene code, but rather to
> Ant. The main thread is waiting on
> org/apache/tools/ant/taskdefs/Parallel.spinThreads, another on
> org/apache/tools/ant/taskdefs/Execute.waitFor and two other on
> java/io/FileInputStream.read. But nothing is related to Lucene code,
> directly. Also annoyingly, but conveniently for debugging that issue, it
> happens very consistently on my machine - sometimes the test passes, but 90%
> hangs.
> Running w/ -Drunsequential=1 consistently succeeds.
>
> We've explored different ways to understand the cause of the problem, and
> came across several improvements and a workaround, but unfortunately not to
> a definite resolution:
>
> * As a last resort, we can add runsequential property to benchmark
> build.xml, which forces Benchmark tests to run sequentially. Since that's a
> tiny package which takes a few seconds to run anyway, and parallelism
> doesn't improve much (it actually runs slower, when it passes, on my
> machine: parallel=15 sec, seq=11 sec), this might be acceptable.
>
> * Moving the junit temp files (such as that flag file) created to the temp
> directory each test uses. This is actually a good thing to do anyway (thanks
> Robert for spotting that), because it avoids accidental commits of such
> files :), as well as doesn't clutter the main environment. We've done that
> because when I hit CTR:+C to stop one of the runs which hung, we received a
> FNFE on a junit flag "file is being accessed by another process" (something
> like that), and thought this is related to the hangs I'm seeing. Anyway,
> this file is attempted access by multiple JVMs concurrently, which seems
> bad.
>
> * Explore the JUnit Formatter code under src/test, since it uses file
> locking. I've disabled locks (using NoLockFactory), however the test still
> hung.
>
> * Change common-build.xml threadsPerProcessor to '1' instead of '2'. We
> think that might be a good thing to do anyway - if people run on machines
> with just one CPU, threading is not expected to help much, as opposed to
> running on multiple CPUs. But we don't want to enforce it on anyone, so we
> think to change the default to '1', but introduce a property
> 'threadsPerProcessor' which users will be able to set explicitly.
> ** Surprisingly, when I set it to '1' or '10' (I run on dual-core Thinkpad
> W500), the test consistently passes - it just doesn't like the value '2'. At
> least it passed as long as I ran it, maybe a thread hang is lurking for me
> around the corner somewhere.
>
> * We made sure the benchmark tests indeed read/write the test data files
> from/to unique directories. But like I said - there is no hang in Lucene
> code reported in the thread dump.
>
> It was very late last night when we stopped, and my eyes were tired, so I
> didn't summarize it right away. Robert, I hope I've captured everything we
> did, if not please add.
>
> Anyone's got any suggestions? It's unfortunate that I'm the only one
> running into this problem, because whatever the suggestions are, you'll
> probably need me to confirm them :). And I'm going away for 3 days (camping
> - no internet ... well at least no laptop :)), so unless someone has a
> suggestion within the coming few hours, we can continue that when I get
> back.
>
> Shai
>

I think you got everything. I reopened the JIRA issue too (LUCENE-1709) and
listed the things we can do for sure now, such as lowering
threadsPerProcessor (and allowing someone to use a system property to
override this) and fixing junit temp files to be in the temp directory.
Additionally I would like to fix the ant library problem as mentioned there.
it works great from the command-line but we should improve this for
IDE-users, so they do not see a compile error.

I am personally for the idea of adding the runsequential property to
benchmark's build.xml, to force it to run serially. While I am unable to
reproduce your problem, it does not surprise me, as I had a tough time
trying to prevent benchmark tests from stepping on each others toes.


-- 
Robert Muir
rcmuir@gmail.com