You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gora.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/05/18 23:01:16 UTC

[jira] [Created] (GORA-225) Various Issues with MemStore

Lewis John McGibbney created GORA-225:
-----------------------------------------

             Summary: Various Issues with MemStore 
                 Key: GORA-225
                 URL: https://issues.apache.org/jira/browse/GORA-225
             Project: Apache Gora
          Issue Type: Bug
          Components: gora-core, testing
    Affects Versions: 0.3
         Environment: Nutch 2.x HEAD, gora-core 0.3
            Reporter: Lewis John McGibbney
             Fix For: 0.4


In Nutch we have numerous testing scenarios which simulate persistence of data to Gora in some form or other. It has worked good as until now.
Now that gora-sql-0.1.1-incubating artifact is non-compatible with gora-core 0.3, there is a requirement to address this situation in order to keep some degree of integrity within the Nutch codebase.
Specifcally a number of tests [0][1][2][3] all extend a Util testing class which utilizes functionality from the gora-sql artifact.

My initial solution was to switch to using MemStore... which brought me to logging this issue!

Test [0] fails with the following useless logging... I need to DEBUG this much more throughly

{code}
Testcase: testGenerateHighest took 1.845 sec
	FAILED
expected:<2> but was:<0>
junit.framework.AssertionFailedError: expected:<2> but was:<0>
	at org.apache.nutch.crawl.TestGenerator.testGenerateHighest(TestGenerator.java:78)

Testcase: testGenerateHostLimit took 1.207 sec
	FAILED
expected:<1> but was:<0>
junit.framework.AssertionFailedError: expected:<1> but was:<0>
	at org.apache.nutch.crawl.TestGenerator.testGenerateHostLimit(TestGenerator.java:134)

Testcase: testGenerateDomainLimit took 1.175 sec
	FAILED
expected:<1> but was:<0>
junit.framework.AssertionFailedError: expected:<1> but was:<0>
	at org.apache.nutch.crawl.TestGenerator.testGenerateDomainLimit(TestGenerator.java:185)

Testcase: testFilter took 2.31 sec
	FAILED
expected:<3> but was:<0>
junit.framework.AssertionFailedError: expected:<3> but was:<0>
	at org.apache.nutch.crawl.TestGenerator.testFilter(TestGenerator.java:239)
{code}

Tests [1][2] are fail identically with the following stack trace

{code}   
Testcase: testInject took 1.931 sec
	Caused an ERROR
null
java.util.NoSuchElementException
	at java.util.TreeMap.key(TreeMap.java:1221)
	at java.util.TreeMap.firstKey(TreeMap.java:285)
	at org.apache.gora.memory.store.MemStore.execute(MemStore.java:122)
	at org.apache.nutch.util.CrawlTestUtil.readContents(CrawlTestUtil.java:112)
	at org.apache.nutch.crawl.TestInjector.readDb(TestInjector.java:104)
	at org.apache.nutch.crawl.TestInjector.testInject(TestInjector.java:62)
{code}

Finally, a multithreaded test in [3] fails with the following

{code}
java.util.ConcurrentModificationException
	at java.util.TreeMap$NavigableSubMap$SubMapIterator.nextEntry(TreeMap.java:1594)
	at java.util.TreeMap$NavigableSubMap$SubMapKeyIterator.next(TreeMap.java:1655)
	at org.apache.gora.memory.store.MemStore$MemResult.nextInner(MemStore.java:81)
	at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:112)
	at org.apache.nutch.storage.TestGoraStorage.readWrite(TestGoraStorage.java:74)
	at org.apache.nutch.storage.TestGoraStorage.access$100(TestGoraStorage.java:41)
	at org.apache.nutch.storage.TestGoraStorage$1.call(TestGoraStorage.java:107)
	at org.apache.nutch.storage.TestGoraStorage$1.call(TestGoraStorage.java:102)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)
{code}

I believe that the final failure is due to to the use of TreeMap [5] as a private object in MemStore. TreeMap implementations are not synchronized. If multiple threads access a map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with an existing key is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedSortedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map e.g.

   SortedMap m = Collections.synchronizedSortedMap(new TreeMap(...));

N.B. The NOTE on TreeMap's come right from the Oracle JavaDoc.

[0] http://svn.apache.org/viewvc/nutch/branches/2.x/src/test/org/apache/nutch/crawl/TestGenerator.java?view=markup
[1] http://svn.apache.org/viewvc/nutch/branches/2.x/src/test/org/apache/nutch/crawl/TestInjector.java?view=markup
[2] http://svn.apache.org/viewvc/nutch/branches/2.x/src/test/org/apache/nutch/fetcher/TestFetcher.java?view=markup
[3] http://svn.apache.org/viewvc/nutch/branches/2.x/src/test/org/apache/nutch/storage/TestGoraStorage.java?view=markup
[4] http://svn.apache.org/viewvc/nutch/branches/2.x/src/test/org/apache/nutch/util/AbstractNutchTest.java?view=markup
[5] http://docs.oracle.com/javase/6/docs/api/java/util/TreeMap.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira