You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Richard Eckart de Castilho (Jira)" <de...@uima.apache.org> on 2020/05/08 19:17:00 UTC

[jira] [Updated] (UIMA-6232) Reduce overhead of createTypeSystemDescription() and friends

     [ https://issues.apache.org/jira/browse/UIMA-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Eckart de Castilho updated UIMA-6232:
---------------------------------------------
    Summary: Reduce overhead of createTypeSystemDescription() and friends  (was: Reduce overhead of createTypeSystemDescription())

> Reduce overhead of createTypeSystemDescription() and friends
> ------------------------------------------------------------
>
>                 Key: UIMA-6232
>                 URL: https://issues.apache.org/jira/browse/UIMA-6232
>             Project: UIMA
>          Issue Type: Improvement
>            Reporter: Richard Eckart de Castilho
>            Assignee: Richard Eckart de Castilho
>            Priority: Major
>             Fix For: 2.6.0uimaFIT
>
>
> uimaFIT offers a range of factory methods which use classpath scanning to locate type system descriptions, type priority definitions and index definitions. 
> The present implementation scans for each type of object once and then stores the locations in which the descriptors were found in a global static variable. The user can call a method to clear this variable and force a re-scan.
> Whenever client code calls a method such as {{createTypeSystemDescription()}} the cached locations are read, parsed, and a corresponding Java descriptor object is created and returned.
> This issue is about two problems with this approach:
> 1) finding of the descriptor locations does only consider the ClassLoader situation the first time the scanning takes place. If at a later stage, {{createTypeSystemDescription()}} is called in the context of a ClassLoader with access to a different set of descriptions, this is not considered.
> 2) parsing the XML files every time e.g.  {{createTypeSystemDescription()}} is called is slowing uimaFIT down overall. These methods are potentially called very often, in particular every time that {{createEngineDescription()}} or similar methods are called. Depending on the context, the parse overhead can have significant impact on the overall execution time.
> As a solution for 1), we could adopt a similar approach as it is used for JCas wrapper classes in the JCasImpl: the locations are stored in a {{WeakHashMap}} mapping the current ClassLoader to the discovered locations. The "current" ClassLoader is obtained via the Spring {{ClassUtils.getDefaultClassLoader()}} which is also (indirectly) used in many other places in uimaFIT. In particular, this method uses a Thead context classloader - if one is available.
> As a solution for 2), we do not only keep a {{WeakHashMap}} cache for the locations, but also for the parsed and aggregated XML files. When calling e.g. {{createTypeSystemDescription()}} and the cache already contains a respective descriptor, then a deep clone of it is returned. A similar approach (cloning a descriptor) was recently also introduced into UIMA Core to avoid repeatedly loading and parsing default flow controller definitions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)