You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Remy Sanouillet <re...@foreseemed.com> on 2019/02/28 18:46:42 UTC

Using a Lucene index dictionary

Hello cTakers!

I've been playing with the provided sample dictionaries that use
Lucene indexes generated by the CreateLuceneIndexForExampleDrugs.java
and CreateLuceneIndexForSnomedLikeSample.java scripts.

I think I have everything configured right from the output of the log:
>>>>>
27 Feb 2019 19:12:10  INFO Chunker - Chunker model file:
org/apache/ctakes/chunker/models/chunker-model.zip
27 Feb 2019 19:12:12  INFO TokenizerAnnotatorPTB - Initializing
org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
27 Feb 2019 19:12:12  INFO ContextDependentTokenizerAnnotator - Finite
state machines loaded.
27 Feb 2019 19:12:12  INFO LuceneIndexReaderResourceImpl -
indexDir=org/apache/ctakes/dictionary/lookup/drug_index  exists.
27 Feb 2019 19:12:12  INFO LuceneIndexReaderResourceImpl - Loading
Lucene Index into memory:
resources/org/apache/ctakes/dictionary/lookup/drug_index
27 Feb 2019 19:12:12  INFO LuceneIndexReaderResourceImpl - Loaded
Lucene Index, # docs=5
27 Feb 2019 19:12:12  INFO LuceneIndexReaderResourceImpl -
indexDir=org/apache/ctakes/dictionary/lookup/OrangeBook  exists.
27 Feb 2019 19:12:12  INFO LuceneIndexReaderResourceImpl - Loading
Lucene Index into memory:
resources/org/apache/ctakes/dictionary/lookup/OrangeBook
27 Feb 2019 19:12:12  INFO LuceneIndexReaderResourceImpl - Loaded
Lucene Index, # docs=18889
27 Feb 2019 19:12:12  INFO DictionaryLookupAnnotator - Parsing
descriptor: {CTAKES_HOME}/resources/org/apache/ctakes/dictionary/lookup/LookupDesc.xml
27 Feb 2019 19:12:13  INFO FirstTokenPermLookupInitializerImpl -
Exclusion tagset loaded: [cc, pp, cd, pdt, vbn, vbp, pp$, wdt, wrb,
ls, vb, vbz, dt, ex, pos, md, vbd, wp, vbg, to, wps, rp]
<<<<<<<<<

That run was configured for the drug index exclusively and it seems to
say that the indexes were successfully loaded and the
DictionaryLookupAnnotator is happy. No error or warning in the log.

However, I get absolutely no results even if I use the keywords that
are mentioned in the script. (e.g. Acetaminophen, Aspirin, Ibuprofen,
Celexa). Not a single textsem or refsem tag. No mention of any of the
associated codes in the output; only the usual syntactic elements. The
terms are tagged as nouns as expected. The DictionaryLookupAnnotator
is invoked.

Same story if I switch to the SNOMED index. None of the twelve keywords trigger.

What am I missing and what do I need to do to get those dictionaries
to behave? Does anyone have a functional Lucene index?

Thanks,

          Remy Sanouillet