You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Peter Klügl <pe...@averbis.com> on 2017/08/03 08:51:24 UTC

Re: High Memory Usage in Uima ruta

Hi,


if you want me to improve the rules, you have to provide some
representative text.

If I make up some text and optimize the rules, I'll report a speedup of X.

Then you test the optimized rule and in case the results of rules are
correct (no realistic text to test it on), you measure a speed up of Y.

Then we start again where I ask for some representative text.


Best,


Peter


Am 20.07.2017 um 14:52 schrieb Peter Klügl:
> Hi,
>
>
> can you provide a dummy/exemplary document for the optimization? As
> similar to your usual imput as possible.
>
> The size of the document, the coverage and amount of annotations are
> some important key figures for the optimization.
>
>
> Best,
>
>
> Peter
>
>
> Am 20.07.2017 um 12:27 schrieb Gaurav Dudeja:
>> This is per reference of this question I raised on StackOverflow As per @Peter Kluegl there is too much scope for code improvement.
>> So eagerly looking how can I improve this script
>> https://stackoverflow.com/questions/44351051/uima-ruta-out-of-memory-issue-in-spark-context
>>
>> =========================================================
>> TYPESYSTEM EDMTypeSystem;
>>
>> WORDLIST EnglishStopWordList = 'en/anchor/en_stopWords.txt';
>> WORDLIST FiltersList = 'en/anchor/AnchorFilters.txt';
>> DECLARE Filters, EnglishStopWords;
>> DECLARE Anchors, SpanStart,SpanClose;
>>
>> DocumentAnnotation{-> ADDRETAINTYPE(MARKUP)};
>>
>> DocumentAnnotation{-> MARKFAST(Filters, FiltersList)};
>>
>> STRING MixCharacterRegex = "[0-9]+[a-zA-Z]+";
>>
>> DocumentAnnotation{-> MARKFAST(EnglishStopWords, EnglishStopWordList,true)};
>> (SW | CW | CAP ) { -> MARK(Anchors, 1, 2)};
>> Anchors{CONTAINS(EnglishStopWords) -> UNMARK(Anchors)};
>>
>> (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) (SW | CW | CAP ) (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) EnglishStopWords? { -> MARK(Anchors, 1, 4)};
>> (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM)? (SW | CW | CAP ) (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) EnglishStopWords? { -> MARK(Anchors, 1, 4)};
>> (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) (SW | CW | CAP ) (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM)? EnglishStopWords? { -> MARK(Anchors, 1, 4)};
>> (SW | CW | CAP ) (SPECIAL{REGEXP("['\"-=()\\[\\]]")}| PM) EnglishStopWords? { -> MARK(Anchors, 1, 3)};
>>
>> Anchors{CONTAINS(MARKUP) -> UNMARK(Anchors)};
>> MixCharacterRegex -> Anchors;
>>
>> "<Value>"  -> SpanStart;
>> "</Value>" -> SpanClose;
>>
>> Anchors{-> CREATE(ExtractedData, "type" = "ANCHOR", "value" = Anchors)};
>>
>> SpanStart Filters? SPACE? ExtractedData SPACE? Filters? SpanClose{-> GATHER(Data, 2, 6, "ExtractedData" = 4)};
>> =========================================================