You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Chen, Pei" <Pe...@childrens.harvard.edu> on 2013/08/15 15:00:37 UTC
RE: umls lookup issue
Hi Samir,
[including the public dev list]
Thanks for opening up a new thread on this issue.
Would you be able to help narrow down the sentence that you believe is causing the NP2LookupWindow to take 3h to process? I can’t seem to reproduce it on my end.
I vaguely remember someone running into something where it could go into a loop, so hopefully maybe they can also chime in…
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Wednesday, August 14, 2013 7:30 PM
To: Chen, Pei
Subject: Re: umls lookup issue
specifically the NP2LookupWindow that causes de delay
________________________________
From: samir chabou <sa...@yahoo.com>
To: "Chen, Pei" <Pe...@childrens.harvard.edu>
Sent: Wednesday, August 14, 2013 7:21:18 PM
Subject: Re: umls lookup issue
Hi Pei
I removed the LookupWindowAnnotator went very fast less than 1 min but there was no annotations for EntityMention and EventMention, it looks there is some thinh wrong with the LookupWindowAnnotator
Samir
________________________________
From: samir chabou <sa...@yahoo.com>
To: "Chen, Pei" <Pe...@childrens.harvard.edu>
Sent: Wednesday, August 14, 2013 7:11:57 PM
Subject: Re: umls lookup issue
Hi Pei
I removed the lookupwindowannotation went very fast less than 1 min but there was no annotations for EntityMention and EventMention, it looks there is some thinh wrong with the lookupwindowannotation
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: samir chabou <sa...@yahoo.com>
Sent: Wednesday, August 14, 2013 3:40:46 PM
Subject: RE: umls lookup issue
That is strange- it shouldn’t take that long. I wonder if it’s going into an infinite loop.
Have you tried debugging it? Perhaps removing some of the lines in the note or removing the dictionary lookup component itself?
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Wednesday, August 14, 2013 1:14 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei,
Unfortunately, the removal of the DependencyParsser and Assertion did not make difference (it has been running now for 1h so i stopped). Pei I think the bottle neck was the LookupWindowAnnotator, yesterday when it was running the console showed the LookupWindowAnnotator annotations it took quit time to go from one LookupWindow to an other, also these annotations of lookupwindows was done twice.
Memory: Xms500M and Xmx1500
The jdk : JavaSE-1.6 (jre7)
below screen capture showing from where i got the memory and jdk info + the structure of AggregatePlaintextUMLSProcessor.xml without the DependencyParsser and Assertion
Thanks a lot
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: samir chabou <sa...@yahoo.com>
Sent: Wednesday, August 14, 2013 10:08:00 AM
Subject: RE: umls lookup issue
Hi Samir,
It shouldn’t take 3h… it’s a bit strange. cTAKES is much more constrained to memory rather than cpu. Do you know which JDK and what the java memory settings were used?
Could you also try removing the new annotators that were added in 3.0? DependencyParser, Assertion Module. See attached as an example.
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Tuesday, August 13, 2013 10:48 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei
I tried the clinical pipeline as is with no modification except for umls username and password, it took more than 5h on my laptop to process the text sample that i send to you. Then I thought may be my laptop was not performing enough so I tried it in on an other laptop i7, 16M, 2.4Mhz but again it took 3h and plus. I was wondering if you run it within 5minutes what was the environment.
Next step as you suggested I will try to create a local db on mysql for the db umls2011ab and proceed the text. But again it strange that in version cTakes 2.5 this same test took less than one minute.
Thanks a lot for your cooperation your was appreciated
Re: umls lookup issue
Posted by samir chabou <sa...@yahoo.com>.
Hi Pei
1- abstract attached file is what I used as a sample
2- AggregatePlaintextUMLSProcessor attached file is the .xml configuration (note: even if i removed the dependencyParser, SemanticRoleLabler, AssertionAnnotator and ExtractionPrepAnnotator these does not change the performance)
Thank you very much for your help
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: samir chabou <sa...@yahoo.com>
Cc: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Sent: Thursday, August 15, 2013 10:23:50 PM
Subject: RE: umls lookup issue
Hi Samir,
Do you have a sample sentence that causes the 3hr run?
Also could you attach the AggregatePipeline.xml configuration used? In case, someone else on the dev list may have encountered this in the past already.
I'll try and see if I can recreate it.
--Pei
________________________________
From: samir chabou [samirchb@yahoo.com]
Sent: Thursday, August 15, 2013 7:07 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei,
we did more debuging and it's the lookup call below (higlighted in yelleow) that causes the delay.
performLookup is in DictionaryLookupAnnotator.java
private void performLookup(JCas jcas, LookupSpec ls, List lookupTokenList,
Map ctxMap) throws Exception
{
// sort the lookup tokens
Collections.sort(lookupTokenList, LookupTokenComparator.getInstance() );
// perform lookup
Collection lookupHitCol = null;
LookupAlgorithm la = (LookupAlgorithm) ls.getLookupAlgorithm();
lookupHitCol = la.lookup(lookupTokenList, ctxMap);
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Cc: samir chabou <sa...@yahoo.com>
Sent: Thursday, August 15, 2013 9:00:37 AM
Subject: RE: umls lookup issue
Hi Samir,
[including the public dev list]
Thanks for opening up a new thread on this issue.
Would you be able to help narrow down the sentence that you believe is causing the NP2LookupWindow to take 3h to process? I can’t seem to reproduce it on my end.
I vaguely remember someone running into something where it could go into a loop, so hopefully maybe they can also chime in…
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Wednesday, August 14, 2013 7:30 PM
To: Chen, Pei
Subject: Re: umls lookup issue
specifically the NP2LookupWindow that causes de delay
________________________________
From: samir chabou <sa...@yahoo.com>
To: "Chen, Pei" <Pe...@childrens.harvard.edu>
Sent: Wednesday, August 14, 2013 7:21:18 PM
Subject: Re: umls lookup issue
Hi Pei
I removed the LookupWindowAnnotator went very fast less than 1 min but there was no annotations for EntityMention and EventMention, it looks there is some thinh wrong with the LookupWindowAnnotator
Samir
________________________________
From: samir chabou <sa...@yahoo.com>
To: "Chen, Pei" <Pe...@childrens.harvard.edu>
Sent: Wednesday, August 14, 2013 7:11:57 PM
Subject: Re: umls lookup issue
Hi Pei
I removed the lookupwindowannotation went very fast less than 1 min but there was no annotations for EntityMention and EventMention, it looks there is some thinh wrong with the lookupwindowannotation
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: samir chabou <sa...@yahoo.com>
Sent: Wednesday, August 14, 2013 3:40:46 PM
Subject: RE: umls lookup issue
That is strange- it shouldn’t take that long. I wonder if it’s going into an infinite loop.
Have you tried debugging it? Perhaps removing some of the lines in the note or removing the dictionary lookup component itself?
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Wednesday, August 14, 2013 1:14 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei,
Unfortunately, the removal of the DependencyParsser and Assertion did not make difference (it has been running now for 1h so i stopped). Pei I think the bottle neck was the LookupWindowAnnotator, yesterday when it was running the console showed the LookupWindowAnnotator annotations it took quit time to go from one LookupWindow to an other, also these annotations of lookupwindows was done twice.
Memory: Xms500M and Xmx1500
The jdk : JavaSE-1.6 (jre7)
below screen capture showing from where i got the memory and jdk info + the structure of AggregatePlaintextUMLSProcessor.xml without the DependencyParsser and Assertion
Thanks a lot
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: samir chabou <sa...@yahoo.com>
Sent: Wednesday, August 14, 2013 10:08:00 AM
Subject: RE: umls lookup issue
Hi Samir,
It shouldn’t take 3h… it’s a bit strange. cTAKES is much more constrained to memory rather than cpu. Do you know which JDK and what the java memory settings were used?
Could you also try removing the new annotators that were added in 3.0? DependencyParser, Assertion Module. See attached as an example.
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Tuesday, August 13, 2013 10:48 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei
I tried the clinical pipeline as is with no modification except for umls username and password, it took more than 5h on my laptop to process the text sample that i send to you. Then I thought may be my laptop was not performing enough so I tried it in on an other laptop i7, 16M, 2.4Mhz but again it took 3h and plus. I was wondering if you run it within 5minutes what was the environment.
Next step as you suggested I will try to create a local db on mysql for the db umls2011ab and proceed the text. But again it strange that in version cTakes 2.5 this same test took less than one minute.
Thanks a lot for your cooperation your was appreciated
Re: pico pipeline
Posted by samir chabou <sa...@yahoo.com>.
Hi Pei, thanks for your feedback:
Yes there was some typo, I also added some
cTakes annotators in the pico pipeline (see attached).
Our purpose is:
1) To improve the
PICO recognition in abstracts text compared to what we are currently
testing with metamap
2 2) Through out my reading of some articles there
are some authors find that PICO is a useful organizing structure for clinical questions, otherthey suggest it is less suitable for DiagnosisorPrognosis. From the few pico’s cases that I saw (I
need to see a good sample of Pico’s Dignosis, Prognosis), I think :
a. In the case of the Diagnosis and Prognosis the
focus of the question is most likely Output (O?) (Need to be confirmed). Since the Output is usually tend to be the relation between the Problem and Intervention (I -- O -->P or C -- O --> P), consequently one of the thing to enhance Pico structure
for Prognosis and Dignosis is to enhance the aspect of relation recognition in
the text, to do so we are planning to write code/logic to add some relations
which will help the recognition of the O aspect.
b. I also noticed that these
suggestions about pico on prognosis were advanced before 2006 where the focus
of the research was on the named entity recognition (NER) but few were done
about relation recognition. I think
since then, the NLP techniques are evolved considerably to be more efficient in relation recognition
which will help our purpose.
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: samir chabou <sa...@yahoo.com>; "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Sent: Friday, August 23, 2013 10:41:46 AM
Subject: RE: pico pipeline
Hi Samir,
Perhaps others can chime in as well as I'm not too familiar with the proposed pipeline-
But it looks really interesting- especially the higher level components such as the Intervention Annotator (are you planning to write code/logic to 'infer' what the cause or prognosis was?,etc).
Yes, one can certainly use the already annotated data from the existing components. Some notes:
- SentenceDetector, Tokenizer is listed twice in the pipeline (is that just a typo?)
- I think there may be other components that you may want to include which might help with your higher level annotators (some of which may or may not be available in mata-map):
o Assertion (Negation, Subject, History-of, etc.)
o Co Reference
o Semantic Role Labeler
o Temporal?
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Thursday, August 22, 2013 11:43 AM
To: Chen, Pei; dev@ctakes.apache.org
Subject: pico pipeline
Hi Pei,
I'm trying to use ctakes to annotate PICO question concepts. I attached to you the pipeline that I'll construct to do so. Please can you have a quick look to the attached file and tell me if i'm on the right track or if you have some suggestions.
Thanks a lot
Samir
RE: pico pipeline
Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
Hi Samir,
Perhaps others can chime in as well as I'm not too familiar with the proposed pipeline-
But it looks really interesting- especially the higher level components such as the Intervention Annotator (are you planning to write code/logic to 'infer' what the cause or prognosis was?,etc).
Yes, one can certainly use the already annotated data from the existing components. Some notes:
- SentenceDetector, Tokenizer is listed twice in the pipeline (is that just a typo?)
- I think there may be other components that you may want to include which might help with your higher level annotators (some of which may or may not be available in mata-map):
o Assertion (Negation, Subject, History-of, etc.)
o Co Reference
o Semantic Role Labeler
o Temporal?
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Thursday, August 22, 2013 11:43 AM
To: Chen, Pei; dev@ctakes.apache.org
Subject: pico pipeline
Hi Pei,
I'm trying to use ctakes to annotate PICO question concepts. I attached to you the pipeline that I'll construct to do so. Please can you have a quick look to the attached file and tell me if i'm on the right track or if you have some suggestions.
Thanks a lot
Samir
pico pipeline
Posted by samir chabou <sa...@yahoo.com>.
Hi Pei,
I'm trying to use ctakes to annotate PICO question concepts. I attached to you the pipeline that I'll construct to do so. Please can you have a quick look to the attached file and tell me if i'm on the right track or if you have some suggestions.
Thanks a lot
Samir
RE: umls lookup issue
Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
Hi Samir,
That is awesome news.. glad to hear it worked out for you.
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Monday, August 19, 2013 12:58 PM
To: Chen, Pei; dev@ctakes.apache.org
Subject: Re: umls lookup issue
Hi Pei,
I have a good news for you :) the issue was resolved. The problem was related to the missing umls2011ab which is not found in the resource folder.
I think (i'm not sure) cTakes tries to apply the query against the resource umls2011ab when it does not find it it tries to apply it on the umls server side and that takes time.
This morning i added the umls2011ab resource and I'm able to run the test within 2 min.
Also over the week end, i created a local db for the two tables umls_ms_2011ab and umls_snomed_map I load them with a sample of data. The performance on the local db is less than 1 min for the same test.
Thanks a lot for your support
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>; samir chabou <sa...@yahoo.com>
Sent: Monday, August 19, 2013 10:40:34 AM
Subject: RE: umls lookup issue
Hi Samir,
I ran your attached DefaultAggregateUMLSPipleine and abstract text file (Using the trunk codebase and the -XX:+UseConcMarkSweepGC -Xms500M -Xmx1600M args) .
It took about 2min16secs (see attached results output) to finish.
From the initial looks, it doesn’t appear to be a loop or a bug to me and seems more like a local setup/configuration issue…
Re: ctakes-resource-umls2011ab, did the system print out an error/exception message? If it was missing, it should have thrown an exception and not hang for 3hrs but eventually finish.
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Sunday, August 18, 2013 1:35 PM
To: dev@ctakes.apache.org
Subject: Re: umls lookup issue
Hi Pei,
i'm sorry if i bothered you a bit with my umls lookup issue. I just noticed that I have an error in the pom of the lookup dictionary project, that may a be a clue to my problem. Can you please have a fast look to the attached file where i put the details of the error - it looks as if i'm missing a kind of project ctakes-resource-umls2011ab.
Thanks Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: samir chabou <sa...@yahoo.com>
Cc: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Sent: Thursday, August 15, 2013 10:23:50 PM
Subject: RE: umls lookup issue
Hi Samir,
Do you have a sample sentence that causes the 3hr run?
Also could you attach the AggregatePipeline.xml configuration used? In case, someone else on the dev list may have encountered this in the past already.
I'll try and see if I can recreate it.
--Pei
________________________________
From: samir chabou [samirchb@yahoo.com<ma...@yahoo.com>]
Sent: Thursday, August 15, 2013 7:07 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei,
we did more debuging and it's the lookup call below (higlighted in yelleow) that causes the delay.
performLookup is in DictionaryLookupAnnotator.java
private void performLookup(JCas jcas, LookupSpec ls, List lookupTokenList,
Map ctxMap) throws Exception
{
// sort the lookup tokens
Collections.sort(lookupTokenList, LookupTokenComparator.getInstance() );
// perform lookup
Collection lookupHitCol = null;
LookupAlgorithm la = (LookupAlgorithm) ls.getLookupAlgorithm();
lookupHitCol = la.lookup(lookupTokenList, ctxMap);
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>>
To: "dev@ctakes.apache.org<ma...@ctakes.apache.org>" <de...@ctakes.apache.org>>
Cc: samir chabou <sa...@yahoo.com>>
Sent: Thursday, August 15, 2013 9:00:37 AM
Subject: RE: umls lookup issue
Hi Samir,
[including the public dev list]
Thanks for opening up a new thread on this issue.
Would you be able to help narrow down the sentence that you believe is causing the NP2LookupWindow to take 3h to process? I can’t seem to reproduce it on my end.
I vaguely remember someone running into something where it could go into a loop, so hopefully maybe they can also chime in…
--Pei
From: samir chabou [mailto:samirchb@yahoo.com<ma...@yahoo.com>]
Sent: Wednesday, August 14, 2013 7:30 PM
To: Chen, Pei
Subject: Re: umls lookup issue
specifically the NP2LookupWindow that causes de delay
________________________________
From: samir chabou <sa...@yahoo.com>>
To: "Chen, Pei" <Pe...@childrens.harvard.edu>>
Sent: Wednesday, August 14, 2013 7:21:18 PM
Subject: Re: umls lookup issue
Hi Pei
I removed the LookupWindowAnnotator went very fast less than 1 min but there was no annotations for EntityMention and EventMention, it looks there is some thinh wrong with the LookupWindowAnnotator
Samir
________________________________
From: samir chabou <sa...@yahoo.com>>
To: "Chen, Pei" <Pe...@childrens.harvard.edu>>
Sent: Wednesday, August 14, 2013 7:11:57 PM
Subject: Re: umls lookup issue
Hi Pei
I removed the lookupwindowannotation went very fast less than 1 min but there was no annotations for EntityMention and EventMention, it looks there is some thinh wrong with the lookupwindowannotation
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>>
To: samir chabou <sa...@yahoo.com>>
Sent: Wednesday, August 14, 2013 3:40:46 PM
Subject: RE: umls lookup issue
That is strange- it shouldn’t take that long. I wonder if it’s going into an infinite loop.
Have you tried debugging it? Perhaps removing some of the lines in the note or removing the dictionary lookup component itself?
--Pei
From: samir chabou [mailto:samirchb@yahoo.com<ma...@yahoo.com>]
Sent: Wednesday, August 14, 2013 1:14 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei,
Unfortunately, the removal of the DependencyParsser and Assertion did not make difference (it has been running now for 1h so i stopped). Pei I think the bottle neck was the LookupWindowAnnotator, yesterday when it was running the console showed the LookupWindowAnnotator annotations it took quit time to go from one LookupWindow to an other, also these annotations of lookupwindows was done twice.
Memory: Xms500M and Xmx1500
The jdk : JavaSE-1.6 (jre7)
below screen capture showing from where i got the memory and jdk info + the structure of AggregatePlaintextUMLSProcessor.xml without the DependencyParsser and Assertion
Thanks a lot
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>>
To: samir chabou <sa...@yahoo.com>>
Sent: Wednesday, August 14, 2013 10:08:00 AM
Subject: RE: umls lookup issue
Hi Samir,
It shouldn’t take 3h… it’s a bit strange. cTAKES is much more constrained to memory rather than cpu. Do you know which JDK and what the java memory settings were used?
Could you also try removing the new annotators that were added in 3.0? DependencyParser, Assertion Module. See attached as an example.
--Pei
From: samir chabou [mailto:samirchb@yahoo.com<ma...@yahoo.com>]
Sent: Tuesday, August 13, 2013 10:48 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei
I tried the clinical pipeline as is with no modification except for umls username and password, it took more than 5h on my laptop to process the text sample that i send to you. Then I thought may be my laptop was not performing enough so I tried it in on an other laptop i7, 16M, 2.4Mhz but again it took 3h and plus. I was wondering if you run it within 5minutes what was the environment.
Next step as you suggested I will try to create a local db on mysql for the db umls2011ab and proceed the text. But again it strange that in version cTakes 2.5 this same test took less than one minute.
Thanks a lot for your cooperation your was appreciated
Re: umls lookup issue
Posted by samir chabou <sa...@yahoo.com>.
Hi Pei,
I have a good news for you :) the issue was resolved. The problem was related to the missing umls2011ab which is not found in the resource folder.
I think (i'm not sure) cTakes tries to apply the query against the resource umls2011ab when it does not find it it tries to apply it on the umls server side and that takes time.
This morning i added the umls2011ab resource and I'm able to run the test within 2 min.
Also over the week end, i created a local db for the two tables umls_ms_2011ab and umls_snomed_map I load them with a sample of data. The performance on the local db is less than 1 min for the same test.
Thanks a lot for your support
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>; samir chabou <sa...@yahoo.com>
Sent: Monday, August 19, 2013 10:40:34 AM
Subject: RE: umls lookup issue
Hi Samir,
I ran your attached DefaultAggregateUMLSPipleine and abstract text file (Using the trunk codebase and the -XX:+UseConcMarkSweepGC -Xms500M -Xmx1600M args) .
It took about 2min16secs (see attached results output) to finish.
From the initial looks, it doesn’t appear to be a loop or a bug to me and seems more like a local setup/configuration issue…
Re: ctakes-resource-umls2011ab, did the system print out an error/exception message? If it was missing, it should have thrown an exception and not hang for 3hrs but eventually finish.
--Pei
From:samir chabou [mailto:samirchb@yahoo.com]
Sent: Sunday, August 18, 2013 1:35 PM
To: dev@ctakes.apache.org
Subject: Re: umls lookup issue
Hi Pei,
i'm sorry if i bothered you a bit with my umls lookup issue. I just noticed that I have an error in the pom of the lookup dictionary project, that may a be a clue to my problem. Can you please have a fast look to the attached file where i put the details of
the error - it looks as if i'm missing a kind of project ctakes-resource-umls2011ab.
Thanks Samir
________________________________
From:"Chen, Pei" <Pe...@childrens.harvard.edu>
To: samir chabou <sa...@yahoo.com>
Cc: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Sent: Thursday, August 15, 2013 10:23:50 PM
Subject: RE: umls lookup issue
Hi Samir,
Do you have a sample sentence that causes the 3hr run?
Also could you attach the AggregatePipeline.xml configuration used? In case, someone else on the dev list may have encountered this in the past already.
I'll try and see if I can recreate it.
--Pei
________________________________
From: samir chabou [samirchb@yahoo.com]
Sent: Thursday, August 15, 2013 7:07 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei,
we did more debuging and it's the lookup call below (higlighted in yelleow) that causes the delay.
performLookup is in DictionaryLookupAnnotator.java
private void performLookup(JCas jcas, LookupSpec ls, List lookupTokenList,
Map ctxMap) throws Exception
{
// sort the lookup tokens
Collections.sort(lookupTokenList, LookupTokenComparator.getInstance() );
// perform lookup
Collection lookupHitCol = null;
LookupAlgorithm la = (LookupAlgorithm) ls.getLookupAlgorithm();
lookupHitCol = la.lookup(lookupTokenList, ctxMap);
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Cc: samir chabou <sa...@yahoo.com>
Sent: Thursday, August 15, 2013 9:00:37 AM
Subject: RE: umls lookup issue
Hi Samir,
[including the public dev list]
Thanks for opening up a new thread on this issue.
Would you be able to help narrow down the sentence that you believe is causing the NP2LookupWindow to take 3h to process? I can’t seem to reproduce it on my end.
I vaguely remember someone running into something where it could go into a loop, so hopefully maybe they can also chime in…
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Wednesday, August 14, 2013 7:30 PM
To: Chen, Pei
Subject: Re: umls lookup issue
specifically the NP2LookupWindow that causes de delay
________________________________
From: samir chabou <sa...@yahoo.com>
To: "Chen, Pei" <Pe...@childrens.harvard.edu>
Sent: Wednesday, August 14, 2013 7:21:18 PM
Subject: Re: umls lookup issue
Hi Pei
I removed the LookupWindowAnnotator went very fast less than 1 min but there was no annotations for EntityMention and EventMention, it looks there is some thinh wrong with the LookupWindowAnnotator
Samir
________________________________
From: samir chabou <sa...@yahoo.com>
To: "Chen, Pei" <Pe...@childrens.harvard.edu>
Sent: Wednesday, August 14, 2013 7:11:57 PM
Subject: Re: umls lookup issue
Hi Pei
I removed the lookupwindowannotation went very fast less than 1 min but there was no annotations for EntityMention and EventMention, it looks there is some thinh wrong with the lookupwindowannotation
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: samir chabou <sa...@yahoo.com>
Sent: Wednesday, August 14, 2013 3:40:46 PM
Subject: RE: umls lookup issue
That is strange- it shouldn’t take that long. I wonder if it’s going into an infinite loop.
Have you tried debugging it? Perhaps removing some of the lines in the note or removing the dictionary lookup component itself?
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Wednesday, August 14, 2013 1:14 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei,
Unfortunately, the removal of the DependencyParsser and Assertion did not make difference (it has been running now for 1h so i stopped). Pei I think the bottle neck was the LookupWindowAnnotator, yesterday when it was running the console showed the LookupWindowAnnotator
annotations it took quit time to go from one LookupWindow to an other, also these annotations of lookupwindows was done twice.
Memory: Xms500M and Xmx1500
The jdk : JavaSE-1.6 (jre7)
below screen capture showing from where i got the memory and jdk info + the structure of AggregatePlaintextUMLSProcessor.xml without the DependencyParsser and Assertion
Thanks a lot
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: samir chabou <sa...@yahoo.com>
Sent: Wednesday, August 14, 2013 10:08:00 AM
Subject: RE: umls lookup issue
Hi Samir,
It shouldn’t take 3h… it’s a bit strange. cTAKES is much more constrained to memory rather than cpu. Do you know which JDK and what the java memory settings were used?
Could you also try removing the new annotators that were added in 3.0? DependencyParser, Assertion Module. See attached as an example.
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Tuesday, August 13, 2013 10:48 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei
I tried the clinical pipeline as is with no modification except for umls username and password, it took more than 5h on my laptop to process the text sample that i send to you. Then I thought may be my laptop was not performing enough so I tried it in on an
other laptop i7, 16M, 2.4Mhz but again it took 3h and plus. I was wondering if you run it within 5minutes what was the environment.
Next step as you suggested I will try to create a local db on mysql for the db umls2011ab and proceed the text. But again it strange that in version cTakes 2.5 this same test took less than one minute.
Thanks a lot for your cooperation your was appreciated
RE: umls lookup issue
Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
Hi Samir,
I ran your attached DefaultAggregateUMLSPipleine and abstract text file (Using the trunk codebase and the -XX:+UseConcMarkSweepGC -Xms500M -Xmx1600M args) .
It took about 2min16secs (see attached results output) to finish.
From the initial looks, it doesn’t appear to be a loop or a bug to me and seems more like a local setup/configuration issue…
Re: ctakes-resource-umls2011ab, did the system print out an error/exception message? If it was missing, it should have thrown an exception and not hang for 3hrs but eventually finish.
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Sunday, August 18, 2013 1:35 PM
To: dev@ctakes.apache.org
Subject: Re: umls lookup issue
Hi Pei,
i'm sorry if i bothered you a bit with my umls lookup issue. I just noticed that I have an error in the pom of the lookup dictionary project, that may a be a clue to my problem. Can you please have a fast look to the attached file where i put the details of the error - it looks as if i'm missing a kind of project ctakes-resource-umls2011ab.
Thanks Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: samir chabou <sa...@yahoo.com>
Cc: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Sent: Thursday, August 15, 2013 10:23:50 PM
Subject: RE: umls lookup issue
Hi Samir,
Do you have a sample sentence that causes the 3hr run?
Also could you attach the AggregatePipeline.xml configuration used? In case, someone else on the dev list may have encountered this in the past already.
I'll try and see if I can recreate it.
--Pei
________________________________
From: samir chabou [samirchb@yahoo.com<ma...@yahoo.com>]
Sent: Thursday, August 15, 2013 7:07 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei,
we did more debuging and it's the lookup call below (higlighted in yelleow) that causes the delay.
performLookup is in DictionaryLookupAnnotator.java
private void performLookup(JCas jcas, LookupSpec ls, List lookupTokenList,
Map ctxMap) throws Exception
{
// sort the lookup tokens
Collections.sort(lookupTokenList, LookupTokenComparator.getInstance() );
// perform lookup
Collection lookupHitCol = null;
LookupAlgorithm la = (LookupAlgorithm) ls.getLookupAlgorithm();
lookupHitCol = la.lookup(lookupTokenList, ctxMap);
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>>
To: "dev@ctakes.apache.org<ma...@ctakes.apache.org>" <de...@ctakes.apache.org>>
Cc: samir chabou <sa...@yahoo.com>>
Sent: Thursday, August 15, 2013 9:00:37 AM
Subject: RE: umls lookup issue
Hi Samir,
[including the public dev list]
Thanks for opening up a new thread on this issue.
Would you be able to help narrow down the sentence that you believe is causing the NP2LookupWindow to take 3h to process? I can’t seem to reproduce it on my end.
I vaguely remember someone running into something where it could go into a loop, so hopefully maybe they can also chime in…
--Pei
From: samir chabou [mailto:samirchb@yahoo.com<ma...@yahoo.com>]
Sent: Wednesday, August 14, 2013 7:30 PM
To: Chen, Pei
Subject: Re: umls lookup issue
specifically the NP2LookupWindow that causes de delay
________________________________
From: samir chabou <sa...@yahoo.com>>
To: "Chen, Pei" <Pe...@childrens.harvard.edu>>
Sent: Wednesday, August 14, 2013 7:21:18 PM
Subject: Re: umls lookup issue
Hi Pei
I removed the LookupWindowAnnotator went very fast less than 1 min but there was no annotations for EntityMention and EventMention, it looks there is some thinh wrong with the LookupWindowAnnotator
Samir
________________________________
From: samir chabou <sa...@yahoo.com>>
To: "Chen, Pei" <Pe...@childrens.harvard.edu>>
Sent: Wednesday, August 14, 2013 7:11:57 PM
Subject: Re: umls lookup issue
Hi Pei
I removed the lookupwindowannotation went very fast less than 1 min but there was no annotations for EntityMention and EventMention, it looks there is some thinh wrong with the lookupwindowannotation
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>>
To: samir chabou <sa...@yahoo.com>>
Sent: Wednesday, August 14, 2013 3:40:46 PM
Subject: RE: umls lookup issue
That is strange- it shouldn’t take that long. I wonder if it’s going into an infinite loop.
Have you tried debugging it? Perhaps removing some of the lines in the note or removing the dictionary lookup component itself?
--Pei
From: samir chabou [mailto:samirchb@yahoo.com<ma...@yahoo.com>]
Sent: Wednesday, August 14, 2013 1:14 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei,
Unfortunately, the removal of the DependencyParsser and Assertion did not make difference (it has been running now for 1h so i stopped). Pei I think the bottle neck was the LookupWindowAnnotator, yesterday when it was running the console showed the LookupWindowAnnotator annotations it took quit time to go from one LookupWindow to an other, also these annotations of lookupwindows was done twice.
Memory: Xms500M and Xmx1500
The jdk : JavaSE-1.6 (jre7)
below screen capture showing from where i got the memory and jdk info + the structure of AggregatePlaintextUMLSProcessor.xml without the DependencyParsser and Assertion
Thanks a lot
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>>
To: samir chabou <sa...@yahoo.com>>
Sent: Wednesday, August 14, 2013 10:08:00 AM
Subject: RE: umls lookup issue
Hi Samir,
It shouldn’t take 3h… it’s a bit strange. cTAKES is much more constrained to memory rather than cpu. Do you know which JDK and what the java memory settings were used?
Could you also try removing the new annotators that were added in 3.0? DependencyParser, Assertion Module. See attached as an example.
--Pei
From: samir chabou [mailto:samirchb@yahoo.com<ma...@yahoo.com>]
Sent: Tuesday, August 13, 2013 10:48 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei
I tried the clinical pipeline as is with no modification except for umls username and password, it took more than 5h on my laptop to process the text sample that i send to you. Then I thought may be my laptop was not performing enough so I tried it in on an other laptop i7, 16M, 2.4Mhz but again it took 3h and plus. I was wondering if you run it within 5minutes what was the environment.
Next step as you suggested I will try to create a local db on mysql for the db umls2011ab and proceed the text. But again it strange that in version cTakes 2.5 this same test took less than one minute.
Thanks a lot for your cooperation your was appreciated
Re: umls lookup issue
Posted by samir chabou <sa...@yahoo.com>.
Hi Pei,
i'm sorry if i bothered you a bit with my umls lookup issue. I just noticed that I have an error in the pom of the lookup dictionary project, that may a be a clue to my problem. Can you please have a fast look to the attached file where i put the details of the error - it looks as if i'm missing a kind of project ctakes-resource-umls2011ab.
Thanks Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: samir chabou <sa...@yahoo.com>
Cc: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Sent: Thursday, August 15, 2013 10:23:50 PM
Subject: RE: umls lookup issue
Hi Samir,
Do you have a sample sentence that causes the 3hr run?
Also could you attach the AggregatePipeline.xml configuration used? In case, someone else on the dev list may have encountered this in the past already.
I'll try and see if I can recreate it.
--Pei
________________________________
From: samir chabou [samirchb@yahoo.com]
Sent: Thursday, August 15, 2013 7:07 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei,
we did more debuging and it's the lookup call below (higlighted in yelleow) that causes the delay.
performLookup is in DictionaryLookupAnnotator.java
private void performLookup(JCas jcas, LookupSpec ls, List lookupTokenList,
Map ctxMap) throws Exception
{
// sort the lookup tokens
Collections.sort(lookupTokenList, LookupTokenComparator.getInstance() );
// perform lookup
Collection lookupHitCol = null;
LookupAlgorithm la = (LookupAlgorithm) ls.getLookupAlgorithm();
lookupHitCol = la.lookup(lookupTokenList, ctxMap);
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Cc: samir chabou <sa...@yahoo.com>
Sent: Thursday, August 15, 2013 9:00:37 AM
Subject: RE: umls lookup issue
Hi Samir,
[including the public dev list]
Thanks for opening up a new thread on this issue.
Would you be able to help narrow down the sentence that you believe is causing the NP2LookupWindow to take 3h to process? I can’t seem to reproduce it on my end.
I vaguely remember someone running into something where it could go into a loop, so hopefully maybe they can also chime in…
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Wednesday, August 14, 2013 7:30 PM
To: Chen, Pei
Subject: Re: umls lookup issue
specifically the NP2LookupWindow that causes de delay
________________________________
From: samir chabou <sa...@yahoo.com>
To: "Chen, Pei" <Pe...@childrens.harvard.edu>
Sent: Wednesday, August 14, 2013 7:21:18 PM
Subject: Re: umls lookup issue
Hi Pei
I removed the LookupWindowAnnotator went very fast less than 1 min but there was no annotations for EntityMention and EventMention, it looks there is some thinh wrong with the LookupWindowAnnotator
Samir
________________________________
From: samir chabou <sa...@yahoo.com>
To: "Chen, Pei" <Pe...@childrens.harvard.edu>
Sent: Wednesday, August 14, 2013 7:11:57 PM
Subject: Re: umls lookup issue
Hi Pei
I removed the lookupwindowannotation went very fast less than 1 min but there was no annotations for EntityMention and EventMention, it looks there is some thinh wrong with the lookupwindowannotation
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: samir chabou <sa...@yahoo.com>
Sent: Wednesday, August 14, 2013 3:40:46 PM
Subject: RE: umls lookup issue
That is strange- it shouldn’t take that long. I wonder if it’s going into an infinite loop.
Have you tried debugging it? Perhaps removing some of the lines in the note or removing the dictionary lookup component itself?
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Wednesday, August 14, 2013 1:14 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei,
Unfortunately, the removal of the DependencyParsser and Assertion did not make difference (it has been running now for 1h so i stopped). Pei I think the bottle neck was the LookupWindowAnnotator, yesterday when it was running the console showed the LookupWindowAnnotator annotations it took quit time to go from one LookupWindow to an other, also these annotations of lookupwindows was done twice.
Memory: Xms500M and Xmx1500
The jdk : JavaSE-1.6 (jre7)
below screen capture showing from where i got the memory and jdk info + the structure of AggregatePlaintextUMLSProcessor.xml without the DependencyParsser and Assertion
Thanks a lot
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: samir chabou <sa...@yahoo.com>
Sent: Wednesday, August 14, 2013 10:08:00 AM
Subject: RE: umls lookup issue
Hi Samir,
It shouldn’t take 3h… it’s a bit strange. cTAKES is much more constrained to memory rather than cpu. Do you know which JDK and what the java memory settings were used?
Could you also try removing the new annotators that were added in 3.0? DependencyParser, Assertion Module. See attached as an example.
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Tuesday, August 13, 2013 10:48 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei
I tried the clinical pipeline as is with no modification except for umls username and password, it took more than 5h on my laptop to process the text sample that i send to you. Then I thought may be my laptop was not performing enough so I tried it in on an other laptop i7, 16M, 2.4Mhz but again it took 3h and plus. I was wondering if you run it within 5minutes what was the environment.
Next step as you suggested I will try to create a local db on mysql for the db umls2011ab and proceed the text. But again it strange that in version cTakes 2.5 this same test took less than one minute.
Thanks a lot for your cooperation your was appreciated
RE: umls lookup issue
Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
Hi Samir,
Do you have a sample sentence that causes the 3hr run?
Also could you attach the AggregatePipeline.xml configuration used? In case, someone else on the dev list may have encountered this in the past already.
I'll try and see if I can recreate it.
--Pei
________________________________
From: samir chabou [samirchb@yahoo.com]
Sent: Thursday, August 15, 2013 7:07 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei,
we did more debuging and it's the lookup call below (higlighted in yelleow) that causes the delay.
performLookup is in DictionaryLookupAnnotator.java
private void performLookup(JCas jcas, LookupSpec ls, List lookupTokenList,
Map ctxMap) throws Exception
{
// sort the lookup tokens
Collections.sort(lookupTokenList, LookupTokenComparator.getInstance() );
// perform lookup
Collection lookupHitCol = null;
LookupAlgorithm la = (LookupAlgorithm) ls.getLookupAlgorithm();
lookupHitCol = la.lookup(lookupTokenList, ctxMap);
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Cc: samir chabou <sa...@yahoo.com>
Sent: Thursday, August 15, 2013 9:00:37 AM
Subject: RE: umls lookup issue
Hi Samir,
[including the public dev list]
Thanks for opening up a new thread on this issue.
Would you be able to help narrow down the sentence that you believe is causing the NP2LookupWindow to take 3h to process? I can’t seem to reproduce it on my end.
I vaguely remember someone running into something where it could go into a loop, so hopefully maybe they can also chime in…
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Wednesday, August 14, 2013 7:30 PM
To: Chen, Pei
Subject: Re: umls lookup issue
specifically the NP2LookupWindow that causes de delay
________________________________
From: samir chabou <sa...@yahoo.com>
To: "Chen, Pei" <Pe...@childrens.harvard.edu>
Sent: Wednesday, August 14, 2013 7:21:18 PM
Subject: Re: umls lookup issue
Hi Pei
I removed the LookupWindowAnnotator went very fast less than 1 min but there was no annotations for EntityMention and EventMention, it looks there is some thinh wrong with the LookupWindowAnnotator
Samir
________________________________
From: samir chabou <sa...@yahoo.com>
To: "Chen, Pei" <Pe...@childrens.harvard.edu>
Sent: Wednesday, August 14, 2013 7:11:57 PM
Subject: Re: umls lookup issue
Hi Pei
I removed the lookupwindowannotation went very fast less than 1 min but there was no annotations for EntityMention and EventMention, it looks there is some thinh wrong with the lookupwindowannotation
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: samir chabou <sa...@yahoo.com>
Sent: Wednesday, August 14, 2013 3:40:46 PM
Subject: RE: umls lookup issue
That is strange- it shouldn’t take that long. I wonder if it’s going into an infinite loop.
Have you tried debugging it? Perhaps removing some of the lines in the note or removing the dictionary lookup component itself?
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Wednesday, August 14, 2013 1:14 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei,
Unfortunately, the removal of the DependencyParsser and Assertion did not make difference (it has been running now for 1h so i stopped). Pei I think the bottle neck was the LookupWindowAnnotator, yesterday when it was running the console showed the LookupWindowAnnotator annotations it took quit time to go from one LookupWindow to an other, also these annotations of lookupwindows was done twice.
Memory: Xms500M and Xmx1500
The jdk : JavaSE-1.6 (jre7)
below screen capture showing from where i got the memory and jdk info + the structure of AggregatePlaintextUMLSProcessor.xml without the DependencyParsser and Assertion
Thanks a lot
Samir
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: samir chabou <sa...@yahoo.com>
Sent: Wednesday, August 14, 2013 10:08:00 AM
Subject: RE: umls lookup issue
Hi Samir,
It shouldn’t take 3h… it’s a bit strange. cTAKES is much more constrained to memory rather than cpu. Do you know which JDK and what the java memory settings were used?
Could you also try removing the new annotators that were added in 3.0? DependencyParser, Assertion Module. See attached as an example.
--Pei
From: samir chabou [mailto:samirchb@yahoo.com]
Sent: Tuesday, August 13, 2013 10:48 PM
To: Chen, Pei
Subject: Re: umls lookup issue
Hi Pei
I tried the clinical pipeline as is with no modification except for umls username and password, it took more than 5h on my laptop to process the text sample that i send to you. Then I thought may be my laptop was not performing enough so I tried it in on an other laptop i7, 16M, 2.4Mhz but again it took 3h and plus. I was wondering if you run it within 5minutes what was the environment.
Next step as you suggested I will try to create a local db on mysql for the db umls2011ab and proceed the text. But again it strange that in version cTakes 2.5 this same test took less than one minute.
Thanks a lot for your cooperation your was appreciated