You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2012/06/29 18:42:43 UTC
[jira] [Comment Edited] (STANBOL-669) Proeprty <-> Field cache of the SolrYard is not syncronized

    [ https://issues.apache.org/jira/browse/STANBOL-669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404013#comment-13404013 ] 

Rupert Westenthaler edited comment on STANBOL-669 at 6/29/12 4:41 PM:
----------------------------------------------------------------------

Found the reason causing this issue:

It has nothing to do with the EventJobManager (will change the title of this Issue). The cause where two LRU caches implemented using LinkedHashMap used by the Apache Entityhub SolrYard to store mappings from RDF properties <-> Solr Field names.

This LRU caches where not synchronized with the assumptions that only put and read requests are used. Even removes (if the cache gets to big) would not be a problem, because there is no possibility of dirty reads (because even the remove entry would still be correct) and in case of reads the same calculation would be made twice.

However RTFM of LinkedHashset [1] would have saved a lot of time as it states - even in bold letters - "In access-ordered linked hash maps, merely querying the map with get is a structural modification."

Based on debugging that happened: Two threads access the LinkedHashset at nearly the same time. Both try to change the order based on access time. They do not block each other, but instead the start to consume 100% processing power without coming to an agreement.

This caused than the method to never return what looked first as if some EnhancementsJob never complete with the EventJobManager (hence the original title of this Issue).

As a Bonus of the Stanbol Enhancer will have an Integration Test that can simulate concurrent Enhancement Requests on an Enhancement Requests. Stanbol users will also be able to run this test against their Stanbol Servers by using 

    mvn -o test -Dtest.server.url={stanbol-server} -Dtest=MultiThreadedTest

currently this test runs using english long abstracts of dbpedia (10k of those will be included in the integration test) with 20 concurrent threads over 1000 documents by using the default chain. I plan to extend this test so that it can be configured by additional system properties.

A big thanks to Sebastian Schaffert for his help in tracking this down!

[1] http://download.java.net/jdk7/archive/b123/docs/api/java/util/LinkedHashMap.html
                
      was (Author: rwesten):
    Found the reason causing this issue:

It has nothing to do with the EventJobManager (will change the title of this Issue). The cause where two LRU caches implemented using LinkedHashMap used by the Apache Entityhub SolrYard to store mappings from RDF properties <-> Solr Field names.

This LRU caches where not synchronized with the assumptions that only put and read requests are used. Even removes (if the cache gets to big) would not be a problem, because there is no possibility of dirty reads (because even the remove entry would still be correct) and in case of reads the same calculation would be made twice.

However RTFM of LinkedHashset [1] would have saved a lot of time as it states - even in bold letters - "In access-ordered linked hash maps, merely querying the map with get is a structural modification."

Based on debugging that happened: Two threads access the LinkedHashset at nearly the same time. Both try to change the order based on access time. They do not block each other, but instead the start to consume 100% processing power without coming to an agreement.

This caused than the method to never return what looked first as if some EnhancementsJob never complete with the EventJobManager (hence the original title of this Issue).

As a Bonus of the Stanbol Enhancer will have an Integration Test that can simulate concurrent Enhancement Requests on an Enhancement Requests. Stanbol users will also be able to run this test against their Stanbol Servers by using 

    mvn -o test -Dtest.server.url={stanbol-server} -Dtest=MultiThreadedTest

currently this test runs using english long abstracts of dbpedia (10k of those will be included in the integration test) with 20 concurrent threads over 1000 documents by using the default chain. I plan to extend this test so that it can be configured by additional system properties.


[1] http://download.java.net/jdk7/archive/b123/docs/api/java/util/LinkedHashMap.html
                  
> Proeprty <-> Field cache of the SolrYard is not syncronized
> -----------------------------------------------------------
>
>                 Key: STANBOL-669
>                 URL: https://issues.apache.org/jira/browse/STANBOL-669
>             Project: Stanbol
>          Issue Type: Bug
>          Components: Entity Hub
>    Affects Versions: entityhub-0.10.0-incubating
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> UPDATE: Further analyses have shown that the EventJobManager was not the cause of this. See the 2nd comment for a proper description of this problem. 
> - - -
> When bombarding the enhancer with multiple concurrent EnhancementJobs the EvenJobManager might not correctly process all requests due to changes that do not correctly apply a writeLock on the EnhancementJob.
> As fixing those things is not an easy thing I implemented already an new Integration-Test that allows to send long abstracts from dbpedia as content to the enhancer. The integration-test includes enough data for 10.000 requests. It uses "java.util.concurrent.ExecutorService" for async Requests and the "PoolingClientConnectionManager" of apache http commons for sending multiple parallel requests.
> Setting this to 1000 requests with 10 threads lets easily to reproduce the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira