You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Grant Ingersoll <gs...@apache.org> on 2013/06/07 06:10:36 UTC

Random Errors

testTranspose(org.apache.mahout.math.hadoop.TestDistributedRowMatrix)  Time elapsed: 1.569 sec  <<< ERROR!
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/tmp/mahout-TestDistributedRowMatrix-8146721276637462528/testdata/transpose-24 already exists
	at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
	at org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:238)
	at org.apache.mahout.math.hadoop.TestDistributedRowMatrix.testTranspose(TestDistributedRowMatrix.java:87)


Anyone seen this?  I'm guessing there are some conflicts due to order methods are run in.

Re: Random Errors

Posted by Suneel Marthi <su...@yahoo.com>.

Agree with Sean again, I am seeing these race conditions happen with HDFS very consistently.

See below:

testDistributedLanczosSolverEVJCLI(org.apache.mahout.math.hadoop.decomposer.TestDistributedLanczosSolverCLI)  Time elapsed: 66.102 sec  <<< ERROR!
java.lang.IllegalStateException: java.io.IOException: The distributed cache object file:/tmp/mahout-TestDistributedLanczosSolverCLI-7471344972447511552/tmp2/1370579360893548000/DistributedMatrix.times.inputVector/1370579360894119000 changed during the job from 6/7/13 12:29 AM to 6/7/13 12:29 AM
    at org.apache.hadoop.filecache.TrackerDistributedCacheManager.downloadCacheObject(TrackerDistributedCacheManager.java:401)
    at org.apache.hadoop.filecache.TrackerDistributedCacheManager.localizePublicCacheObject(TrackerDistributedCacheManager.java:475)
    at org.apache.hadoop.filecache.TrackerDistributedCacheManager.getLocalCache(TrackerDistributedCacheManager.java:191)
    at org.apache.hadoop.filecache.TaskDistributedCacheManager.setupCache(TaskDistributedCacheManager.java:182)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:124)
    at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:437)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
    at org.apache.mahout.math.hadoop.DistributedRowMatrix.timesSquared(DistributedRowMatrix.java:279)
    at org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:110)
    at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:207)
    at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:159)
    at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:118)
    at org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver$DistributedLanczosSolverJob.run(DistributedLanczosSolver.java:290)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.mahout.math.hadoop.decomposer.TestDistributedLanczosSolverCLI.testDistributedLanczosSolverEVJCLI(TestDistributedLanczosSolverCLI.java:128)

completeJobToyExample(org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJobTest)  Time elapsed: 141.7 sec  <<< ERROR!
java.io.IOException: The distributed cache object file:///Users/smarthi/opensourceprojects/Mahout/core/target/mahout-ParallelALSFactorizationJobTest-7107915352722998272/tmp/U-0/part-m-00000 changed during the job from 6/7/13 12:19 AM to 6/7/13 12:19 AM
    at org.apache.hadoop.filecache.TrackerDistributedCacheManager.downloadCacheObject(TrackerDistributedCacheManager.java:401)
    at org.apache.hadoop.filecache.TrackerDistributedCacheManager.localizePublicCacheObject(TrackerDistributedCacheManager.java:475)
    at org.apache.hadoop.filecache.TrackerDistributedCacheManager.getLocalCache(TrackerDistributedCacheManager.java:191)
    at org.apache.hadoop.filecache.TaskDistributedCacheManager.setupCache(TaskDistributedCacheManager.java:182)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:124)
    at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:437)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
    at org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob.runSolver(ParallelALSFactorizationJob.java:329)
    at org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob.run(ParallelALSFactorizationJob.java:188)
    at org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJobTest.explicitExample(ParallelALSFactorizationJobTest.java:105)
    at org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJobTest.completeJobToyExample(ParallelALSFactorizationJobTest.java:64)

I am reverting back to the old configuration to get a clean build, not to mention that my Macbook Pro (i7 processor 4-core, 8GB RAM, 5400rpm HDD) is being deep fried :) when I attempt running tests.

________________________________
 From: Sean Owen <sr...@gmail.com>
To: Mahout Dev List <de...@mahout.apache.org> 
Sent: Friday, June 7, 2013 8:51 AM
Subject: Re: Random Errors

Having looked at it recently -- no the parallelism is per-class, just
for this reason.

I suspect the problem is a race condition vis-a-vis HDFS. Usually some
operate like a delete is visible a moment later when a job starts, but
maybe not always. It could also be some internal source of randomness
somewhere in a library that can't be controlled externally, but I find
that an unlikely explanation for this.

On Fri, Jun 7, 2013 at 1:03 PM, Sebastian Schelter
<ss...@googlemail.com> wrote:
> I'm also getting errors on a test when executing all tests. Don't get
> the error when I run the test in the IDE or via mvn on the commandline.
>
> Do we now also have intra-test class parallelism? If yes, is there a way
> to disable this?
>
> --sebastian
>
>
> On 07.06.2013 09:11, Ted Dunning wrote:
>> This last one is actually more like a non-deterministic test that probably
>> needs a restart strategy to radically decrease the probability of failure
>> or needs a slightly more relaxed threshold.
>>
>>
>>
>> On Fri, Jun 7, 2013 at 7:32 AM, Grant Ingersoll <gs...@apache.org> wrote:
>>
>>> Here's another one:
>>> testClustering(org.apache.mahout.clustering.streaming.cluster.BallKMeansTest)
>>>  Time elapsed: 2.817 sec  <<< FAILURE!
>>> java.lang.AssertionError: expected:<625.0> but was:<753.0>
>>>         at org.junit.Assert.fail(Assert.java:88)
>>>         at org.junit.Assert.failNotEquals(Assert.java:743)
>>>         at org.junit.Assert.assertEquals(Assert.java:494)
>>>         at org.junit.Assert.assertEquals(Assert.java:592)
>>>         at
>>> org.apache.mahout.clustering.streaming.cluster.BallKMeansTest.testClustering(BallKMeansTest.java:119)
>>>
>>>
>>> I suspect that we still have issues w/ the parallel testing, as it doesn't
>>> show up in repeated runs and it isn't consistent.
>>>
>>> On Jun 7, 2013, at 6:10 AM, Grant Ingersoll <gs...@apache.org> wrote:
>>>
>>>> testTranspose(org.apache.mahout.math.hadoop.TestDistributedRowMatrix)
>>>  Time elapsed: 1.569 sec  <<< ERROR!
>>>> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
>>> file:/tmp/mahout-TestDistributedRowMatrix-8146721276637462528/testdata/transpose-24
>>> already exists
>>>>       at
>>> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
>>>>       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951)
>>>>       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
>>>>       at java.security.AccessController.doPrivileged(Native Method)
>>>>       at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>       at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>>>>       at
>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
>>>>       at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
>>>>       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
>>>>       at
>>> org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:238)
>>>>       at
>>> org.apache.mahout.math.hadoop.TestDistributedRowMatrix.testTranspose(TestDistributedRowMatrix.java:87)
>>>>
>>>>
>>>> Anyone seen this?  I'm guessing there are some conflicts due to order
>>> methods are run in.
>>>
>>> --------------------------------------------
>>> Grant Ingersoll | @gsingers
>>> http://www.lucidworks.com
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

Re: Random Errors

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.

If so then this is interesting. Could be an internal VM thread that is
not filtered out for some reason. I would add tracking of the mail
thread group only by adding:

@ThreadLeakGroup(Group.MAIN)


Dawid

On Mon, Jun 10, 2013 at 1:00 PM, Grant Ingersoll <gs...@apache.org> wrote:
> That was the whole stack trace, unfortunately.
>
> On Jun 10, 2013, at 2:35 AM, Dawid Weiss <da...@cs.put.poznan.pl> wrote:
>
>> Grant, top of the stack trace is not sufficient to tell what was the
>> offending thread. Copy-paste the entire stack, including nested
>> exceptions. The console will also contain a full stack trace
>> information at the moment the test framework detected a thread leak.
>> It should be easy to tell what isn't cleaned up properly.
>>
>> Dawid
>
>

Re: Random Errors

Posted by Grant Ingersoll <gs...@apache.org>.

That was the whole stack trace, unfortunately.

On Jun 10, 2013, at 2:35 AM, Dawid Weiss <da...@cs.put.poznan.pl> wrote:

> Grant, top of the stack trace is not sufficient to tell what was the
> offending thread. Copy-paste the entire stack, including nested
> exceptions. The console will also contain a full stack trace
> information at the moment the test framework detected a thread leak.
> It should be easy to tell what isn't cleaned up properly.
> 
> Dawid

Re: Random Errors

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.

Grant, top of the stack trace is not sufficient to tell what was the
offending thread. Copy-paste the entire stack, including nested
exceptions. The console will also contain a full stack trace
information at the moment the test framework detected a thread leak.
It should be easy to tell what isn't cleaned up properly.

Dawid

Re: Random Errors

Posted by Sean Owen <sr...@gmail.com>.

ToolRunner is just the messenger. It parses Hadoop args for you before
invoking your own job's run() method. It also checks exit status. By
itself it is only a bit of wrapper code. You have to use it or else
Hadoop does not pick up its command line args.

On Sun, Jun 9, 2013 at 11:03 PM, Grant Ingersoll <gs...@apache.org> wrote:
> Here's another theory:
>
> I've noticed in my own usage of Hadoop that using the ToolRunner.run method as part of unit tests often leads to bad things (it doesn't always exit cleanly, it seems) and we use it in a lot of places.  Anyone else experience this?
>

Re: Random Errors

Posted by Grant Ingersoll <gs...@apache.org>.

Here's another theory:

I've noticed in my own usage of Hadoop that using the ToolRunner.run method as part of unit tests often leads to bad things (it doesn't always exit cleanly, it seems) and we use it in a lot of places.  Anyone else experience this?


On Jun 9, 2013, at 5:53 PM, Grant Ingersoll <gs...@apache.org> wrote:

> Another one:
> 
> Tests run: 100, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.266 sec <<< FAILURE!
> testViewSequentialAccessSparseVectorWritable {#18 seed=[D60713E4B2CC78FF:9F0DB2A85C2C8731]}(org.apache.mahout.math.VectorWritableTest)  Time elapsed: 0.128 sec  <<< ERROR!
> com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at testViewSequentialAccessSparseVectorWritable {#18 seed=[D60713E4B2CC78FF:9F0DB2A85C2C8731]}(org.apache.mahout.math.VectorWritableTest): 
>   1) Thread[id=13, name=Thread-2, state=TERMINATED, group={null group}]
>        at (empty stack)
> 	at __randomizedtesting.SeedInfo.seed([D60713E4B2CC78FF:9F0DB2A85C2C8731]:0)
> 
> 
> On Jun 9, 2013, at 10:59 AM, Dawid Weiss <da...@cs.put.poznan.pl> wrote:
> 
>>> com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at
>> 
>> This is because the test suite extends RandomizedTest and this in turn
>> enables a feature of the runner named aptly "thread leak detection".
>> The problem is that if you spawn threads from a test and then return
>> to the framework those threads running in the background may affect
>> further computations and be very difficult to debug. The runner
>> detects such cases and fails a test that didn't clean up after itself
>> properly.
>> 
>> There are a few workarounds:
>> 
>> 1) do clean up; after the test is over all threads it possibly starts
>> should be join()'ed with.
>> 
>> 2) sometimes the above is not possible explicitly -- as with
>> executors, for example. If it's known that threads may linger a bit
>> but will eventually terminate a @ThreadLeakLingering(linger = 20000)
>> can be applied which gives the maximum time to wait for stray threads.
>> They still must terminate.
>> 
>> 3) sometimes threads should last for the entire suite's duration. The
>> scope of detection can be changed with @ThreadLeakScope(Scope.SUITE).
>> 
>> 4) finally, if this all is not really needed the feature can be
>> disabled by @ThreadLeakScope(Scope.NONE). I honestly think this is the
>> worst scenario though because leaked threads are *very* difficult to
>> debug if they do something wrong and affect test results.
>> 
>> Take a look at class annotations of:
>> http://goo.gl/n7rYD
>> 
>> for an example of multiple configuration directives used in real life:
>> 
>> @ThreadLeakScope(Scope.SUITE)
>> @ThreadLeakGroup(Group.MAIN)
>> @ThreadLeakAction({Action.WARN, Action.INTERRUPT})
>> @ThreadLeakLingering(linger = 20000) // Wait long for leaked threads
>> to complete before failure. zk needs this.
>> @ThreadLeakZombies(Consequence.IGNORE_REMAINING_TESTS)
>> @TimeoutSuite(millis = 2 * TimeUnits.HOUR)
>> @ThreadLeakFilters(defaultFilters = true, filters = {
>>    QuickPatchThreadsFilter.class
>> })
>> 
>> Dawid
> 
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
> 
> 
> 
> 
> 

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com

Re: Random Errors

Posted by Grant Ingersoll <gs...@apache.org>.

Another one:

Tests run: 100, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.266 sec <<< FAILURE!
testViewSequentialAccessSparseVectorWritable {#18 seed=[D60713E4B2CC78FF:9F0DB2A85C2C8731]}(org.apache.mahout.math.VectorWritableTest)  Time elapsed: 0.128 sec  <<< ERROR!
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at testViewSequentialAccessSparseVectorWritable {#18 seed=[D60713E4B2CC78FF:9F0DB2A85C2C8731]}(org.apache.mahout.math.VectorWritableTest): 
   1) Thread[id=13, name=Thread-2, state=TERMINATED, group={null group}]
        at (empty stack)
	at __randomizedtesting.SeedInfo.seed([D60713E4B2CC78FF:9F0DB2A85C2C8731]:0)


On Jun 9, 2013, at 10:59 AM, Dawid Weiss <da...@cs.put.poznan.pl> wrote:

>> com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at
> 
> This is because the test suite extends RandomizedTest and this in turn
> enables a feature of the runner named aptly "thread leak detection".
> The problem is that if you spawn threads from a test and then return
> to the framework those threads running in the background may affect
> further computations and be very difficult to debug. The runner
> detects such cases and fails a test that didn't clean up after itself
> properly.
> 
> There are a few workarounds:
> 
> 1) do clean up; after the test is over all threads it possibly starts
> should be join()'ed with.
> 
> 2) sometimes the above is not possible explicitly -- as with
> executors, for example. If it's known that threads may linger a bit
> but will eventually terminate a @ThreadLeakLingering(linger = 20000)
> can be applied which gives the maximum time to wait for stray threads.
> They still must terminate.
> 
> 3) sometimes threads should last for the entire suite's duration. The
> scope of detection can be changed with @ThreadLeakScope(Scope.SUITE).
> 
> 4) finally, if this all is not really needed the feature can be
> disabled by @ThreadLeakScope(Scope.NONE). I honestly think this is the
> worst scenario though because leaked threads are *very* difficult to
> debug if they do something wrong and affect test results.
> 
> Take a look at class annotations of:
> http://goo.gl/n7rYD
> 
> for an example of multiple configuration directives used in real life:
> 
> @ThreadLeakScope(Scope.SUITE)
> @ThreadLeakGroup(Group.MAIN)
> @ThreadLeakAction({Action.WARN, Action.INTERRUPT})
> @ThreadLeakLingering(linger = 20000) // Wait long for leaked threads
> to complete before failure. zk needs this.
> @ThreadLeakZombies(Consequence.IGNORE_REMAINING_TESTS)
> @TimeoutSuite(millis = 2 * TimeUnits.HOUR)
> @ThreadLeakFilters(defaultFilters = true, filters = {
>     QuickPatchThreadsFilter.class
> })
> 
> Dawid

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com

Re: Random Errors

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.

> com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at

This is because the test suite extends RandomizedTest and this in turn
enables a feature of the runner named aptly "thread leak detection".
The problem is that if you spawn threads from a test and then return
to the framework those threads running in the background may affect
further computations and be very difficult to debug. The runner
detects such cases and fails a test that didn't clean up after itself
properly.

There are a few workarounds:

1) do clean up; after the test is over all threads it possibly starts
should be join()'ed with.

2) sometimes the above is not possible explicitly -- as with
executors, for example. If it's known that threads may linger a bit
but will eventually terminate a @ThreadLeakLingering(linger = 20000)
can be applied which gives the maximum time to wait for stray threads.
They still must terminate.

3) sometimes threads should last for the entire suite's duration. The
scope of detection can be changed with @ThreadLeakScope(Scope.SUITE).

4) finally, if this all is not really needed the feature can be
disabled by @ThreadLeakScope(Scope.NONE). I honestly think this is the
worst scenario though because leaked threads are *very* difficult to
debug if they do something wrong and affect test results.

Take a look at class annotations of:
http://goo.gl/n7rYD

for an example of multiple configuration directives used in real life:

@ThreadLeakScope(Scope.SUITE)
@ThreadLeakGroup(Group.MAIN)
@ThreadLeakAction({Action.WARN, Action.INTERRUPT})
@ThreadLeakLingering(linger = 20000) // Wait long for leaked threads
to complete before failure. zk needs this.
@ThreadLeakZombies(Consequence.IGNORE_REMAINING_TESTS)
@TimeoutSuite(millis = 2 * TimeUnits.HOUR)
@ThreadLeakFilters(defaultFilters = true, filters = {
     QuickPatchThreadsFilter.class
})

Dawid

Re: Random Errors

Posted by Grant Ingersoll <gs...@apache.org>.

Tests run: 100, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.75 sec <<< FAILURE!
testViewSequentialAccessSparseVectorWritable {#1 seed=[34643F377C10C8B9:3D6AC6E0C554E86F]}(org.apache.mahout.math.VectorWritableTest)  Time elapsed: 0.423 sec  <<< ERROR!
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from TEST scope at testViewSequentialAccessSparseVectorWritable {#1 seed=[34643F377C10C8B9:3D6AC6E0C554E86F]}(org.apache.mahout.math.VectorWritableTest): 
   1) Thread[id=13, name=Thread-2, state=RUNNABLE, group=main]
        at com.apple.java.Application.getAppBundleIdNative(Native Method)
        at com.apple.java.Application.getAppBundleId(Application.java:19)
        at com.apple.java.Usage.performReport(Usage.java:52)
        at com.apple.java.Usage.performAfterDelay(Usage.java:27)
	at __randomizedtesting.SeedInfo.seed([34643F377C10C8B9:3D6AC6E0C554E86F]:0)


This may be a hint.  Don't get it when running it standalone...

On Jun 9, 2013, at 8:50 AM, Sebastian Schelter <ss...@googlemail.com> wrote:

> I observe a similar behavior.
> 
> On 09.06.2013 14:47, Grant Ingersoll wrote:
>> I get a failure on the one below when running in parallel, but not standalone: 
>> 
>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 10.358 sec <<< FAILURE!
>> testRun(org.apache.mahout.text.SequenceFilesFromLuceneStorageMRJobTest)  Time elapsed: 10.358 sec  <<< FAILURE!
>> java.lang.AssertionError: expected:<2002> but was:<0>
>> 	at org.junit.Assert.fail(Assert.java:88)
>> 	at org.junit.Assert.failNotEquals(Assert.java:743)
>> 	at org.junit.Assert.assertEquals(Assert.java:118)
>> 	at org.junit.Assert.assertEquals(Assert.java:555)
>> 	at org.junit.Assert.assertEquals(Assert.java:542)
>> 	at org.apache.mahout.text.SequenceFilesFromLuceneStorageMRJobTest.testRun(SequenceFilesFromLuceneStorageMRJobTest.java:73)
>> 
>> 
>> Interesting thing about this one is the Test class has only a single test and it has no randomization.
>> 
>> FWIW, it's also becoming increasingly clear to me that we need some notion of real integration tests that we can run against a Hadoop cluster (or at least a virtual Hadoop cluster).
>> 
>> -Grant
>> 
>> On Jun 8, 2013, at 9:38 AM, Dawid Weiss <da...@cs.put.poznan.pl> wrote:
>> 
>>>> number generators. Where a test depends on a particular sequence, and
>>>> somewhere an RNG doesn't use the "RandomUtils" trick, it may have a
>>>> different state if other tests ran before.
>>> 
>>> I have a different solution for this in randomizedtesting framework (a
>>> Random instance cannot be shared from test to test, it will throw an
>>> exception if you do share it). This doesn't solve all the possible
>>> problems but proved quite effective at catching test dependencies.
>>> 
>>>> The surefire parameter just controls what order the *classes* run in AFAICT:
>>>> http://maven.apache.org/surefire/maven-surefire-plugin/test-mojo.html#runOrder
>>> 
>>> Yeah, I was on the train when I wrote that e-mail. The trick I
>>> remembered is in fact inside JUnit 4.11 and onwards --
>>> https://github.com/junit-team/junit/blob/master/doc/ReleaseNotes4.11.md#test-execution-order
>>> 
>>> D.
>> 
>> --------------------------------------------
>> Grant Ingersoll | @gsingers
>> http://www.lucidworks.com
>> 
>> 
>> 
>> 
>> 
>> 
> 

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com

Re: Random Errors

Posted by Sebastian Schelter <ss...@googlemail.com>.

I observe a similar behavior.

On 09.06.2013 14:47, Grant Ingersoll wrote:
> I get a failure on the one below when running in parallel, but not standalone: 
> 
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 10.358 sec <<< FAILURE!
> testRun(org.apache.mahout.text.SequenceFilesFromLuceneStorageMRJobTest)  Time elapsed: 10.358 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<2002> but was:<0>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.junit.Assert.assertEquals(Assert.java:555)
> 	at org.junit.Assert.assertEquals(Assert.java:542)
> 	at org.apache.mahout.text.SequenceFilesFromLuceneStorageMRJobTest.testRun(SequenceFilesFromLuceneStorageMRJobTest.java:73)
> 
> 
> Interesting thing about this one is the Test class has only a single test and it has no randomization.
> 
> FWIW, it's also becoming increasingly clear to me that we need some notion of real integration tests that we can run against a Hadoop cluster (or at least a virtual Hadoop cluster).
> 
> -Grant
> 
> On Jun 8, 2013, at 9:38 AM, Dawid Weiss <da...@cs.put.poznan.pl> wrote:
> 
>>> number generators. Where a test depends on a particular sequence, and
>>> somewhere an RNG doesn't use the "RandomUtils" trick, it may have a
>>> different state if other tests ran before.
>>
>> I have a different solution for this in randomizedtesting framework (a
>> Random instance cannot be shared from test to test, it will throw an
>> exception if you do share it). This doesn't solve all the possible
>> problems but proved quite effective at catching test dependencies.
>>
>>> The surefire parameter just controls what order the *classes* run in AFAICT:
>>> http://maven.apache.org/surefire/maven-surefire-plugin/test-mojo.html#runOrder
>>
>> Yeah, I was on the train when I wrote that e-mail. The trick I
>> remembered is in fact inside JUnit 4.11 and onwards --
>> https://github.com/junit-team/junit/blob/master/doc/ReleaseNotes4.11.md#test-execution-order
>>
>> D.
> 
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
> 
> 
> 
> 
> 
>

Re: Random Errors

Posted by Grant Ingersoll <gs...@apache.org>.

I get a failure on the one below when running in parallel, but not standalone: 

Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 10.358 sec <<< FAILURE!
testRun(org.apache.mahout.text.SequenceFilesFromLuceneStorageMRJobTest)  Time elapsed: 10.358 sec  <<< FAILURE!
java.lang.AssertionError: expected:<2002> but was:<0>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:743)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at org.junit.Assert.assertEquals(Assert.java:555)
	at org.junit.Assert.assertEquals(Assert.java:542)
	at org.apache.mahout.text.SequenceFilesFromLuceneStorageMRJobTest.testRun(SequenceFilesFromLuceneStorageMRJobTest.java:73)


Interesting thing about this one is the Test class has only a single test and it has no randomization.

FWIW, it's also becoming increasingly clear to me that we need some notion of real integration tests that we can run against a Hadoop cluster (or at least a virtual Hadoop cluster).

-Grant

On Jun 8, 2013, at 9:38 AM, Dawid Weiss <da...@cs.put.poznan.pl> wrote:

>> number generators. Where a test depends on a particular sequence, and
>> somewhere an RNG doesn't use the "RandomUtils" trick, it may have a
>> different state if other tests ran before.
> 
> I have a different solution for this in randomizedtesting framework (a
> Random instance cannot be shared from test to test, it will throw an
> exception if you do share it). This doesn't solve all the possible
> problems but proved quite effective at catching test dependencies.
> 
>> The surefire parameter just controls what order the *classes* run in AFAICT:
>> http://maven.apache.org/surefire/maven-surefire-plugin/test-mojo.html#runOrder
> 
> Yeah, I was on the train when I wrote that e-mail. The trick I
> remembered is in fact inside JUnit 4.11 and onwards --
> https://github.com/junit-team/junit/blob/master/doc/ReleaseNotes4.11.md#test-execution-order
> 
> D.

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com

Re: Random Errors

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.

> number generators. Where a test depends on a particular sequence, and
> somewhere an RNG doesn't use the "RandomUtils" trick, it may have a
> different state if other tests ran before.

I have a different solution for this in randomizedtesting framework (a
Random instance cannot be shared from test to test, it will throw an
exception if you do share it). This doesn't solve all the possible
problems but proved quite effective at catching test dependencies.

> The surefire parameter just controls what order the *classes* run in AFAICT:
> http://maven.apache.org/surefire/maven-surefire-plugin/test-mojo.html#runOrder

Yeah, I was on the train when I wrote that e-mail. The trick I
remembered is in fact inside JUnit 4.11 and onwards --
https://github.com/junit-team/junit/blob/master/doc/ReleaseNotes4.11.md#test-execution-order

D.

Re: Random Errors

Posted by Sean Owen <sr...@gmail.com>.

I would more readily expect that the dependency is due to the random
number generators. Where a test depends on a particular sequence, and
somewhere an RNG doesn't use the "RandomUtils" trick, it may have a
different state if other tests ran before.

The surefire parameter just controls what order the *classes* run in AFAICT:

http://maven.apache.org/surefire/maven-surefire-plugin/test-mojo.html#runOrder

On Sat, Jun 8, 2013 at 7:40 AM, Dawid Weiss <da...@gmail.com> wrote:
> In fact the core of it is that people assume method order will be that of
> declaration within the class and this is not guaranteed anywhere. Java7
> returns methods from reflection api in an undefined order and this
> propagates to junit. I believe surefire can be configured to use a junit
> runner that guarantees sorted method order. This said it is usually a bug
> to have method interdependencies.
>
> Dawid
>>

Re: Random Errors

Posted by Dawid Weiss <da...@gmail.com>.

In fact the core of it is that people assume method order will be that of
declaration within the class and this is not guaranteed anywhere. Java7
returns methods from reflection api in an undefined order and this
propagates to junit. I believe surefire can be configured to use a junit
runner that guarantees sorted method order. This said it is usually a bug
to have method interdependencies.

Dawid
On Jun 8, 2013 1:26 AM, "Robin Anil" <ro...@gmail.com> wrote:

> FYI:
> Java 7 does out of order execution for Junit.
>
> Here is a public note indicating this. I have seen this a lot
> http://intellijava.blogspot.com/2012/05/junit-and-java-7.html
>
> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>
>
> On Fri, Jun 7, 2013 at 1:38 PM, Grant Ingersoll <gs...@apache.org>
> wrote:
>
> > I wonder too about method execution order, as in some test methods
> > conflict have dependencies on others.
> >
> > The speedup is worth it to me, but we should work to get to the bottom of
> > it.
> >
> > On Jun 7, 2013, at 8:51 AM, Sean Owen <sr...@gmail.com> wrote:
> >
> > > Having looked at it recently -- no the parallelism is per-class, just
> > > for this reason.
> > >
> > > I suspect the problem is a race condition vis-a-vis HDFS. Usually some
> > > operate like a delete is visible a moment later when a job starts, but
> > > maybe not always. It could also be some internal source of randomness
> > > somewhere in a library that can't be controlled externally, but I find
> > > that an unlikely explanation for this.
> > >
> > > On Fri, Jun 7, 2013 at 1:03 PM, Sebastian Schelter
> > > <ss...@googlemail.com> wrote:
> > >> I'm also getting errors on a test when executing all tests. Don't get
> > >> the error when I run the test in the IDE or via mvn on the
> commandline.
> > >>
> > >> Do we now also have intra-test class parallelism? If yes, is there a
> way
> > >> to disable this?
> > >>
> > >> --sebastian
> > >>
> > >>
> > >> On 07.06.2013 09:11, Ted Dunning wrote:
> > >>> This last one is actually more like a non-deterministic test that
> > probably
> > >>> needs a restart strategy to radically decrease the probability of
> > failure
> > >>> or needs a slightly more relaxed threshold.
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Jun 7, 2013 at 7:32 AM, Grant Ingersoll <gsingers@apache.org
> >
> > wrote:
> > >>>
> > >>>> Here's another one:
> > >>>>
> >
> testClustering(org.apache.mahout.clustering.streaming.cluster.BallKMeansTest)
> > >>>> Time elapsed: 2.817 sec  <<< FAILURE!
> > >>>> java.lang.AssertionError: expected:<625.0> but was:<753.0>
> > >>>>        at org.junit.Assert.fail(Assert.java:88)
> > >>>>        at org.junit.Assert.failNotEquals(Assert.java:743)
> > >>>>        at org.junit.Assert.assertEquals(Assert.java:494)
> > >>>>        at org.junit.Assert.assertEquals(Assert.java:592)
> > >>>>        at
> > >>>>
> >
> org.apache.mahout.clustering.streaming.cluster.BallKMeansTest.testClustering(BallKMeansTest.java:119)
> > >>>>
> > >>>>
> > >>>> I suspect that we still have issues w/ the parallel testing, as it
> > doesn't
> > >>>> show up in repeated runs and it isn't consistent.
> > >>>>
> > >>>> On Jun 7, 2013, at 6:10 AM, Grant Ingersoll <gs...@apache.org>
> > wrote:
> > >>>>
> > >>>>>
> testTranspose(org.apache.mahout.math.hadoop.TestDistributedRowMatrix)
> > >>>> Time elapsed: 1.569 sec  <<< ERROR!
> > >>>>> org.apache.hadoop.mapred.FileAlreadyExistsException: Output
> directory
> > >>>>
> >
> file:/tmp/mahout-TestDistributedRowMatrix-8146721276637462528/testdata/transpose-24
> > >>>> already exists
> > >>>>>      at
> > >>>>
> >
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
> > >>>>>      at
> org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951)
> > >>>>>      at
> org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
> > >>>>>      at java.security.AccessController.doPrivileged(Native Method)
> > >>>>>      at javax.security.auth.Subject.doAs(Subject.java:396)
> > >>>>>      at
> > >>>>
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> > >>>>>      at
> > >>>>
> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
> > >>>>>      at
> > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
> > >>>>>      at
> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
> > >>>>>      at
> > >>>>
> >
> org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:238)
> > >>>>>      at
> > >>>>
> >
> org.apache.mahout.math.hadoop.TestDistributedRowMatrix.testTranspose(TestDistributedRowMatrix.java:87)
> > >>>>>
> > >>>>>
> > >>>>> Anyone seen this?  I'm guessing there are some conflicts due to
> order
> > >>>> methods are run in.
> > >>>>
> > >>>> --------------------------------------------
> > >>>> Grant Ingersoll | @gsingers
> > >>>> http://www.lucidworks.com
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> >
> > --------------------------------------------
> > Grant Ingersoll | @gsingers
> > http://www.lucidworks.com
> >
> >
> >
> >
> >
> >
>

Re: Random Errors

Posted by Robin Anil <ro...@gmail.com>.

FYI:
Java 7 does out of order execution for Junit.

Here is a public note indicating this. I have seen this a lot
http://intellijava.blogspot.com/2012/05/junit-and-java-7.html

Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.


On Fri, Jun 7, 2013 at 1:38 PM, Grant Ingersoll <gs...@apache.org> wrote:

> I wonder too about method execution order, as in some test methods
> conflict have dependencies on others.
>
> The speedup is worth it to me, but we should work to get to the bottom of
> it.
>
> On Jun 7, 2013, at 8:51 AM, Sean Owen <sr...@gmail.com> wrote:
>
> > Having looked at it recently -- no the parallelism is per-class, just
> > for this reason.
> >
> > I suspect the problem is a race condition vis-a-vis HDFS. Usually some
> > operate like a delete is visible a moment later when a job starts, but
> > maybe not always. It could also be some internal source of randomness
> > somewhere in a library that can't be controlled externally, but I find
> > that an unlikely explanation for this.
> >
> > On Fri, Jun 7, 2013 at 1:03 PM, Sebastian Schelter
> > <ss...@googlemail.com> wrote:
> >> I'm also getting errors on a test when executing all tests. Don't get
> >> the error when I run the test in the IDE or via mvn on the commandline.
> >>
> >> Do we now also have intra-test class parallelism? If yes, is there a way
> >> to disable this?
> >>
> >> --sebastian
> >>
> >>
> >> On 07.06.2013 09:11, Ted Dunning wrote:
> >>> This last one is actually more like a non-deterministic test that
> probably
> >>> needs a restart strategy to radically decrease the probability of
> failure
> >>> or needs a slightly more relaxed threshold.
> >>>
> >>>
> >>>
> >>> On Fri, Jun 7, 2013 at 7:32 AM, Grant Ingersoll <gs...@apache.org>
> wrote:
> >>>
> >>>> Here's another one:
> >>>>
> testClustering(org.apache.mahout.clustering.streaming.cluster.BallKMeansTest)
> >>>> Time elapsed: 2.817 sec  <<< FAILURE!
> >>>> java.lang.AssertionError: expected:<625.0> but was:<753.0>
> >>>>        at org.junit.Assert.fail(Assert.java:88)
> >>>>        at org.junit.Assert.failNotEquals(Assert.java:743)
> >>>>        at org.junit.Assert.assertEquals(Assert.java:494)
> >>>>        at org.junit.Assert.assertEquals(Assert.java:592)
> >>>>        at
> >>>>
> org.apache.mahout.clustering.streaming.cluster.BallKMeansTest.testClustering(BallKMeansTest.java:119)
> >>>>
> >>>>
> >>>> I suspect that we still have issues w/ the parallel testing, as it
> doesn't
> >>>> show up in repeated runs and it isn't consistent.
> >>>>
> >>>> On Jun 7, 2013, at 6:10 AM, Grant Ingersoll <gs...@apache.org>
> wrote:
> >>>>
> >>>>> testTranspose(org.apache.mahout.math.hadoop.TestDistributedRowMatrix)
> >>>> Time elapsed: 1.569 sec  <<< ERROR!
> >>>>> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> >>>>
> file:/tmp/mahout-TestDistributedRowMatrix-8146721276637462528/testdata/transpose-24
> >>>> already exists
> >>>>>      at
> >>>>
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
> >>>>>      at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951)
> >>>>>      at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
> >>>>>      at java.security.AccessController.doPrivileged(Native Method)
> >>>>>      at javax.security.auth.Subject.doAs(Subject.java:396)
> >>>>>      at
> >>>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> >>>>>      at
> >>>>
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
> >>>>>      at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
> >>>>>      at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
> >>>>>      at
> >>>>
> org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:238)
> >>>>>      at
> >>>>
> org.apache.mahout.math.hadoop.TestDistributedRowMatrix.testTranspose(TestDistributedRowMatrix.java:87)
> >>>>>
> >>>>>
> >>>>> Anyone seen this?  I'm guessing there are some conflicts due to order
> >>>> methods are run in.
> >>>>
> >>>> --------------------------------------------
> >>>> Grant Ingersoll | @gsingers
> >>>> http://www.lucidworks.com
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
>
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
>
>
>
>
>
>

Re: Random Errors

Posted by Grant Ingersoll <gs...@apache.org>.

I wonder too about method execution order, as in some test methods conflict have dependencies on others.

The speedup is worth it to me, but we should work to get to the bottom of it.

On Jun 7, 2013, at 8:51 AM, Sean Owen <sr...@gmail.com> wrote:

> Having looked at it recently -- no the parallelism is per-class, just
> for this reason.
> 
> I suspect the problem is a race condition vis-a-vis HDFS. Usually some
> operate like a delete is visible a moment later when a job starts, but
> maybe not always. It could also be some internal source of randomness
> somewhere in a library that can't be controlled externally, but I find
> that an unlikely explanation for this.
> 
> On Fri, Jun 7, 2013 at 1:03 PM, Sebastian Schelter
> <ss...@googlemail.com> wrote:
>> I'm also getting errors on a test when executing all tests. Don't get
>> the error when I run the test in the IDE or via mvn on the commandline.
>> 
>> Do we now also have intra-test class parallelism? If yes, is there a way
>> to disable this?
>> 
>> --sebastian
>> 
>> 
>> On 07.06.2013 09:11, Ted Dunning wrote:
>>> This last one is actually more like a non-deterministic test that probably
>>> needs a restart strategy to radically decrease the probability of failure
>>> or needs a slightly more relaxed threshold.
>>> 
>>> 
>>> 
>>> On Fri, Jun 7, 2013 at 7:32 AM, Grant Ingersoll <gs...@apache.org> wrote:
>>> 
>>>> Here's another one:
>>>> testClustering(org.apache.mahout.clustering.streaming.cluster.BallKMeansTest)
>>>> Time elapsed: 2.817 sec  <<< FAILURE!
>>>> java.lang.AssertionError: expected:<625.0> but was:<753.0>
>>>>        at org.junit.Assert.fail(Assert.java:88)
>>>>        at org.junit.Assert.failNotEquals(Assert.java:743)
>>>>        at org.junit.Assert.assertEquals(Assert.java:494)
>>>>        at org.junit.Assert.assertEquals(Assert.java:592)
>>>>        at
>>>> org.apache.mahout.clustering.streaming.cluster.BallKMeansTest.testClustering(BallKMeansTest.java:119)
>>>> 
>>>> 
>>>> I suspect that we still have issues w/ the parallel testing, as it doesn't
>>>> show up in repeated runs and it isn't consistent.
>>>> 
>>>> On Jun 7, 2013, at 6:10 AM, Grant Ingersoll <gs...@apache.org> wrote:
>>>> 
>>>>> testTranspose(org.apache.mahout.math.hadoop.TestDistributedRowMatrix)
>>>> Time elapsed: 1.569 sec  <<< ERROR!
>>>>> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
>>>> file:/tmp/mahout-TestDistributedRowMatrix-8146721276637462528/testdata/transpose-24
>>>> already exists
>>>>>      at
>>>> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
>>>>>      at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951)
>>>>>      at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
>>>>>      at java.security.AccessController.doPrivileged(Native Method)
>>>>>      at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>>      at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>>>>>      at
>>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
>>>>>      at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
>>>>>      at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
>>>>>      at
>>>> org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:238)
>>>>>      at
>>>> org.apache.mahout.math.hadoop.TestDistributedRowMatrix.testTranspose(TestDistributedRowMatrix.java:87)
>>>>> 
>>>>> 
>>>>> Anyone seen this?  I'm guessing there are some conflicts due to order
>>>> methods are run in.
>>>> 
>>>> --------------------------------------------
>>>> Grant Ingersoll | @gsingers
>>>> http://www.lucidworks.com
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com

Re: Random Errors

Posted by Sean Owen <sr...@gmail.com>.

Having looked at it recently -- no the parallelism is per-class, just
for this reason.

I suspect the problem is a race condition vis-a-vis HDFS. Usually some
operate like a delete is visible a moment later when a job starts, but
maybe not always. It could also be some internal source of randomness
somewhere in a library that can't be controlled externally, but I find
that an unlikely explanation for this.

On Fri, Jun 7, 2013 at 1:03 PM, Sebastian Schelter
<ss...@googlemail.com> wrote:
> I'm also getting errors on a test when executing all tests. Don't get
> the error when I run the test in the IDE or via mvn on the commandline.
>
> Do we now also have intra-test class parallelism? If yes, is there a way
> to disable this?
>
> --sebastian
>
>
> On 07.06.2013 09:11, Ted Dunning wrote:
>> This last one is actually more like a non-deterministic test that probably
>> needs a restart strategy to radically decrease the probability of failure
>> or needs a slightly more relaxed threshold.
>>
>>
>>
>> On Fri, Jun 7, 2013 at 7:32 AM, Grant Ingersoll <gs...@apache.org> wrote:
>>
>>> Here's another one:
>>> testClustering(org.apache.mahout.clustering.streaming.cluster.BallKMeansTest)
>>>  Time elapsed: 2.817 sec  <<< FAILURE!
>>> java.lang.AssertionError: expected:<625.0> but was:<753.0>
>>>         at org.junit.Assert.fail(Assert.java:88)
>>>         at org.junit.Assert.failNotEquals(Assert.java:743)
>>>         at org.junit.Assert.assertEquals(Assert.java:494)
>>>         at org.junit.Assert.assertEquals(Assert.java:592)
>>>         at
>>> org.apache.mahout.clustering.streaming.cluster.BallKMeansTest.testClustering(BallKMeansTest.java:119)
>>>
>>>
>>> I suspect that we still have issues w/ the parallel testing, as it doesn't
>>> show up in repeated runs and it isn't consistent.
>>>
>>> On Jun 7, 2013, at 6:10 AM, Grant Ingersoll <gs...@apache.org> wrote:
>>>
>>>> testTranspose(org.apache.mahout.math.hadoop.TestDistributedRowMatrix)
>>>  Time elapsed: 1.569 sec  <<< ERROR!
>>>> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
>>> file:/tmp/mahout-TestDistributedRowMatrix-8146721276637462528/testdata/transpose-24
>>> already exists
>>>>       at
>>> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
>>>>       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951)
>>>>       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
>>>>       at java.security.AccessController.doPrivileged(Native Method)
>>>>       at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>       at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>>>>       at
>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
>>>>       at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
>>>>       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
>>>>       at
>>> org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:238)
>>>>       at
>>> org.apache.mahout.math.hadoop.TestDistributedRowMatrix.testTranspose(TestDistributedRowMatrix.java:87)
>>>>
>>>>
>>>> Anyone seen this?  I'm guessing there are some conflicts due to order
>>> methods are run in.
>>>
>>> --------------------------------------------
>>> Grant Ingersoll | @gsingers
>>> http://www.lucidworks.com
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

Re: Random Errors

Posted by Sebastian Schelter <ss...@apache.org>.

I did and got no errors, the errors only occur during the execution of all
tests.
Am 07.06.2013 14:48 schrieb "Ted Dunning" <te...@gmail.com>:

> Note that you can run an entire class of tests from the mvn command line.
>
>
> On Fri, Jun 7, 2013 at 2:03 PM, Sebastian Schelter
> <ss...@googlemail.com>wrote:
>
> > I'm also getting errors on a test when executing all tests. Don't get
> > the error when I run the test in the IDE or via mvn on the commandline.
> >
> > Do we now also have intra-test class parallelism? If yes, is there a way
> > to disable this?
> >
> > --sebastian
> >
> >
> > On 07.06.2013 09:11, Ted Dunning wrote:
> > > This last one is actually more like a non-deterministic test that
> > probably
> > > needs a restart strategy to radically decrease the probability of
> failure
> > > or needs a slightly more relaxed threshold.
> > >
> > >
> > >
> > > On Fri, Jun 7, 2013 at 7:32 AM, Grant Ingersoll <gs...@apache.org>
> > wrote:
> > >
> > >> Here's another one:
> > >>
> >
> testClustering(org.apache.mahout.clustering.streaming.cluster.BallKMeansTest)
> > >>  Time elapsed: 2.817 sec  <<< FAILURE!
> > >> java.lang.AssertionError: expected:<625.0> but was:<753.0>
> > >>         at org.junit.Assert.fail(Assert.java:88)
> > >>         at org.junit.Assert.failNotEquals(Assert.java:743)
> > >>         at org.junit.Assert.assertEquals(Assert.java:494)
> > >>         at org.junit.Assert.assertEquals(Assert.java:592)
> > >>         at
> > >>
> >
> org.apache.mahout.clustering.streaming.cluster.BallKMeansTest.testClustering(BallKMeansTest.java:119)
> > >>
> > >>
> > >> I suspect that we still have issues w/ the parallel testing, as it
> > doesn't
> > >> show up in repeated runs and it isn't consistent.
> > >>
> > >> On Jun 7, 2013, at 6:10 AM, Grant Ingersoll <gs...@apache.org>
> > wrote:
> > >>
> > >>> testTranspose(org.apache.mahout.math.hadoop.TestDistributedRowMatrix)
> > >>  Time elapsed: 1.569 sec  <<< ERROR!
> > >>> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> > >>
> >
> file:/tmp/mahout-TestDistributedRowMatrix-8146721276637462528/testdata/transpose-24
> > >> already exists
> > >>>       at
> > >>
> >
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
> > >>>       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951)
> > >>>       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
> > >>>       at java.security.AccessController.doPrivileged(Native Method)
> > >>>       at javax.security.auth.Subject.doAs(Subject.java:396)
> > >>>       at
> > >>
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> > >>>       at
> > >>
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
> > >>>       at
> > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
> > >>>       at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
> > >>>       at
> > >>
> >
> org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:238)
> > >>>       at
> > >>
> >
> org.apache.mahout.math.hadoop.TestDistributedRowMatrix.testTranspose(TestDistributedRowMatrix.java:87)
> > >>>
> > >>>
> > >>> Anyone seen this?  I'm guessing there are some conflicts due to order
> > >> methods are run in.
> > >>
> > >> --------------------------------------------
> > >> Grant Ingersoll | @gsingers
> > >> http://www.lucidworks.com
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >
> >
> >
>

Re: Random Errors

Posted by Ted Dunning <te...@gmail.com>.

Note that you can run an entire class of tests from the mvn command line.


On Fri, Jun 7, 2013 at 2:03 PM, Sebastian Schelter
<ss...@googlemail.com>wrote:

> I'm also getting errors on a test when executing all tests. Don't get
> the error when I run the test in the IDE or via mvn on the commandline.
>
> Do we now also have intra-test class parallelism? If yes, is there a way
> to disable this?
>
> --sebastian
>
>
> On 07.06.2013 09:11, Ted Dunning wrote:
> > This last one is actually more like a non-deterministic test that
> probably
> > needs a restart strategy to radically decrease the probability of failure
> > or needs a slightly more relaxed threshold.
> >
> >
> >
> > On Fri, Jun 7, 2013 at 7:32 AM, Grant Ingersoll <gs...@apache.org>
> wrote:
> >
> >> Here's another one:
> >>
> testClustering(org.apache.mahout.clustering.streaming.cluster.BallKMeansTest)
> >>  Time elapsed: 2.817 sec  <<< FAILURE!
> >> java.lang.AssertionError: expected:<625.0> but was:<753.0>
> >>         at org.junit.Assert.fail(Assert.java:88)
> >>         at org.junit.Assert.failNotEquals(Assert.java:743)
> >>         at org.junit.Assert.assertEquals(Assert.java:494)
> >>         at org.junit.Assert.assertEquals(Assert.java:592)
> >>         at
> >>
> org.apache.mahout.clustering.streaming.cluster.BallKMeansTest.testClustering(BallKMeansTest.java:119)
> >>
> >>
> >> I suspect that we still have issues w/ the parallel testing, as it
> doesn't
> >> show up in repeated runs and it isn't consistent.
> >>
> >> On Jun 7, 2013, at 6:10 AM, Grant Ingersoll <gs...@apache.org>
> wrote:
> >>
> >>> testTranspose(org.apache.mahout.math.hadoop.TestDistributedRowMatrix)
> >>  Time elapsed: 1.569 sec  <<< ERROR!
> >>> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> >>
> file:/tmp/mahout-TestDistributedRowMatrix-8146721276637462528/testdata/transpose-24
> >> already exists
> >>>       at
> >>
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
> >>>       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951)
> >>>       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
> >>>       at java.security.AccessController.doPrivileged(Native Method)
> >>>       at javax.security.auth.Subject.doAs(Subject.java:396)
> >>>       at
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> >>>       at
> >> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
> >>>       at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
> >>>       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
> >>>       at
> >>
> org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:238)
> >>>       at
> >>
> org.apache.mahout.math.hadoop.TestDistributedRowMatrix.testTranspose(TestDistributedRowMatrix.java:87)
> >>>
> >>>
> >>> Anyone seen this?  I'm guessing there are some conflicts due to order
> >> methods are run in.
> >>
> >> --------------------------------------------
> >> Grant Ingersoll | @gsingers
> >> http://www.lucidworks.com
> >>
> >>
> >>
> >>
> >>
> >>
> >
>
>

Re: Random Errors

Posted by Sebastian Schelter <ss...@googlemail.com>.

I'm also getting errors on a test when executing all tests. Don't get
the error when I run the test in the IDE or via mvn on the commandline.

Do we now also have intra-test class parallelism? If yes, is there a way
to disable this?

--sebastian


On 07.06.2013 09:11, Ted Dunning wrote:
> This last one is actually more like a non-deterministic test that probably
> needs a restart strategy to radically decrease the probability of failure
> or needs a slightly more relaxed threshold.
> 
> 
> 
> On Fri, Jun 7, 2013 at 7:32 AM, Grant Ingersoll <gs...@apache.org> wrote:
> 
>> Here's another one:
>> testClustering(org.apache.mahout.clustering.streaming.cluster.BallKMeansTest)
>>  Time elapsed: 2.817 sec  <<< FAILURE!
>> java.lang.AssertionError: expected:<625.0> but was:<753.0>
>>         at org.junit.Assert.fail(Assert.java:88)
>>         at org.junit.Assert.failNotEquals(Assert.java:743)
>>         at org.junit.Assert.assertEquals(Assert.java:494)
>>         at org.junit.Assert.assertEquals(Assert.java:592)
>>         at
>> org.apache.mahout.clustering.streaming.cluster.BallKMeansTest.testClustering(BallKMeansTest.java:119)
>>
>>
>> I suspect that we still have issues w/ the parallel testing, as it doesn't
>> show up in repeated runs and it isn't consistent.
>>
>> On Jun 7, 2013, at 6:10 AM, Grant Ingersoll <gs...@apache.org> wrote:
>>
>>> testTranspose(org.apache.mahout.math.hadoop.TestDistributedRowMatrix)
>>  Time elapsed: 1.569 sec  <<< ERROR!
>>> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
>> file:/tmp/mahout-TestDistributedRowMatrix-8146721276637462528/testdata/transpose-24
>> already exists
>>>       at
>> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
>>>       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951)
>>>       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
>>>       at java.security.AccessController.doPrivileged(Native Method)
>>>       at javax.security.auth.Subject.doAs(Subject.java:396)
>>>       at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>>>       at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
>>>       at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
>>>       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
>>>       at
>> org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:238)
>>>       at
>> org.apache.mahout.math.hadoop.TestDistributedRowMatrix.testTranspose(TestDistributedRowMatrix.java:87)
>>>
>>>
>>> Anyone seen this?  I'm guessing there are some conflicts due to order
>> methods are run in.
>>
>> --------------------------------------------
>> Grant Ingersoll | @gsingers
>> http://www.lucidworks.com
>>
>>
>>
>>
>>
>>
>

Re: Random Errors

Posted by Ted Dunning <te...@gmail.com>.

This last one is actually more like a non-deterministic test that probably
needs a restart strategy to radically decrease the probability of failure
or needs a slightly more relaxed threshold.



On Fri, Jun 7, 2013 at 7:32 AM, Grant Ingersoll <gs...@apache.org> wrote:

> Here's another one:
> testClustering(org.apache.mahout.clustering.streaming.cluster.BallKMeansTest)
>  Time elapsed: 2.817 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<625.0> but was:<753.0>
>         at org.junit.Assert.fail(Assert.java:88)
>         at org.junit.Assert.failNotEquals(Assert.java:743)
>         at org.junit.Assert.assertEquals(Assert.java:494)
>         at org.junit.Assert.assertEquals(Assert.java:592)
>         at
> org.apache.mahout.clustering.streaming.cluster.BallKMeansTest.testClustering(BallKMeansTest.java:119)
>
>
> I suspect that we still have issues w/ the parallel testing, as it doesn't
> show up in repeated runs and it isn't consistent.
>
> On Jun 7, 2013, at 6:10 AM, Grant Ingersoll <gs...@apache.org> wrote:
>
> > testTranspose(org.apache.mahout.math.hadoop.TestDistributedRowMatrix)
>  Time elapsed: 1.569 sec  <<< ERROR!
> > org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> file:/tmp/mahout-TestDistributedRowMatrix-8146721276637462528/testdata/transpose-24
> already exists
> >       at
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
> >       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951)
> >       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
> >       at java.security.AccessController.doPrivileged(Native Method)
> >       at javax.security.auth.Subject.doAs(Subject.java:396)
> >       at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> >       at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
> >       at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
> >       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
> >       at
> org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:238)
> >       at
> org.apache.mahout.math.hadoop.TestDistributedRowMatrix.testTranspose(TestDistributedRowMatrix.java:87)
> >
> >
> > Anyone seen this?  I'm guessing there are some conflicts due to order
> methods are run in.
>
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
>
>
>
>
>
>

Re: Random Errors

Posted by Grant Ingersoll <gs...@apache.org>.

Here's another one:
testClustering(org.apache.mahout.clustering.streaming.cluster.BallKMeansTest)  Time elapsed: 2.817 sec  <<< FAILURE!
java.lang.AssertionError: expected:<625.0> but was:<753.0>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:743)
	at org.junit.Assert.assertEquals(Assert.java:494)
	at org.junit.Assert.assertEquals(Assert.java:592)
	at org.apache.mahout.clustering.streaming.cluster.BallKMeansTest.testClustering(BallKMeansTest.java:119)


I suspect that we still have issues w/ the parallel testing, as it doesn't show up in repeated runs and it isn't consistent.

On Jun 7, 2013, at 6:10 AM, Grant Ingersoll <gs...@apache.org> wrote:

> testTranspose(org.apache.mahout.math.hadoop.TestDistributedRowMatrix)  Time elapsed: 1.569 sec  <<< ERROR!
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/tmp/mahout-TestDistributedRowMatrix-8146721276637462528/testdata/transpose-24 already exists
> 	at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
> 	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951)
> 	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> 	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
> 	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
> 	at org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:238)
> 	at org.apache.mahout.math.hadoop.TestDistributedRowMatrix.testTranspose(TestDistributedRowMatrix.java:87)
> 
> 
> Anyone seen this?  I'm guessing there are some conflicts due to order methods are run in.

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com

Re: Random Errors

Posted by Suneel Marthi <su...@yahoo.com>.

Never seen that before.  I am seeing this locally, its Random and is not consistently reproducible.

testRemoval[2](org.apache.mahout.math.neighborhood.SearchSanityTest)  Time elapsed: 0.127 sec  <<< FAILURE!
java.lang.AssertionError: Previous second neighbor should be first expected:<0.0> but was:<15.917992420826485>
    at org.junit.Assert.fail(Assert.java:88)
    at org.junit.Assert.failNotEquals(Assert.java:743)
    at org.junit.Assert.assertEquals(Assert.java:494)
    at org.apache.mahout.math.neighborhood.SearchSanityTest.testRemoval(SearchSanityTest.java:166)







________________________________
 From: Grant Ingersoll <gs...@apache.org>
To: "dev@mahout.apache.org" <de...@mahout.apache.org> 
Sent: Friday, June 7, 2013 12:10 AM
Subject: Random Errors
 

testTranspose(org.apache.mahout.math.hadoop.TestDistributedRowMatrix)  Time elapsed: 1.569 sec  <<< ERROR!
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/tmp/mahout-TestDistributedRowMatrix-8146721276637462528/testdata/transpose-24 already exists
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
    at org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:238)
    at org.apache.mahout.math.hadoop.TestDistributedRowMatrix.testTranspose(TestDistributedRowMatrix.java:87)


Anyone seen this?  I'm guessing there are some conflicts due to order methods are run in.