You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Ahmed Abdeen Hamed <ah...@gmail.com> on 2008/06/23 20:19:20 UTC

ConceptMapper: Performance Matrix

Hello UIMA members,I am using the document analyzer example to analyze large
files from multiple dictionaries. One of the raw files is 7.5MB. The number
of dictionaries is 13, 1MB is the size of each. Is there some sort of a
matrix that you can use to predict the execution time? Has any one written a
paper on the performance analysis of ConceptMapper?
Please let me know if you can.
Best wishes,
--------------------------------------------------------
Ahmed Abdeen Hamed
Scientific Informatics Project Leader
MBLWHOI Library
Marine Biological Laboratory
7 MBL Street Woods Hole, MA 02543 USA
+1 508 289 7676
--
email: abdeen@mbl.edu
--

Re: ConceptMapper: Performance Matrix

Posted by Ahmed Abdeen Hamed <ah...@gmail.com>.
Just to be clear on the dictionary compilation: When you compile a
dictionary file, does it actually produce an output file? If so, where does
it get stored? The command-line tool only takes two params and doesn't ask
for an output param. Do also point at that output file in the primitive AE
or still point at the original dictionary?Thanks again!
Ahmed

On Mon, Jun 23, 2008 at 3:11 PM, Ahmed Abdeen Hamed <ah...@gmail.com>
wrote:

> Great, I will combine the dictionaries then. It's also good to know that
> compiled dictionaries makes a difference if it is a bottleneck. Have a good
> vacation.Ahmed
>
>
>
> On Mon, Jun 23, 2008 at 3:06 PM, Michael Tanenblatt <
> slothrop@park-slope.net> wrote:
>
>> Yes, CompileDictionary.java will do it. But if dictionary loading time is
>> not the problem, I wouldn't bother doing that as it will not buy you much.
>> Combining the dictionaries, for now, should make the biggest difference.
>>
>>
>> On Jun 23, 2008, at 3:02 PM, Ahmed Abdeen Hamed wrote:
>>
>>  Thanks Michael. Dictionaries processing time is reasonable. It's the
>>> document analyzer execution time that is the bottleneck. I will merge the
>>> dictionaries and compile them as you suggested. However, I am not sure
>>> which
>>> command line tool you are referring to. Do you mean:
>>> org.apache.uima.conceptMapper.dictionaryCompiler.CompileDictionary.java?
>>> Thanks for the vacation heads up.
>>> Ahmed
>>>
>>> On Mon, Jun 23, 2008 at 2:37 PM, Michael Tanenblatt <
>>> slothrop@park-slope.net>
>>> wrote:
>>>
>>>  The short answer is "no". Not yet, anyway.
>>>>
>>>> But, here are some things that might help. First, if dictionary loading
>>>> times are long, you can use the command line tool supplied in the
>>>> package to
>>>> compile the dictionary, and use the compiled dictionary. If you do this,
>>>> remember that you will need to change the AE descriptors to use the
>>>> correct
>>>> implementation of the dictionary loader, e.g.:
>>>>
>>>> <externalResource>
>>>>      ...
>>>>
>>>>
>>>> <implementationName>org.apache.uima.conceptMapper.support.dictionaryResource.CompiledDictionaryResource_impl</implementationName>
>>>>      ...
>>>> </externalResource>
>>>>
>>>> That said, if you are using 13 dictionaries, that means you are running
>>>> 13
>>>> copies of ConceptMapper in your pipeline, which means that you are
>>>> traversing each file's text  at 13 times just for your ConceptMapper
>>>> invocations. If you could merge the dictionaries into one, you should
>>>> see a
>>>> marked speedup. Clearly, it a near-term enhancement of ConceptMapper
>>>> would
>>>> be to enable the loading of multiple dictionaries, which get merged at
>>>> initialization time.
>>>>
>>>> One side note: I am going to be on vacation starting on June 25 and will
>>>> only have occasional access to email until I return on July 12. I will
>>>> try
>>>> to answer questions during that time when I do have access, but I really
>>>> have no idea how often that will be.
>>>>
>>>>
>>>>
>>>> On Jun 23, 2008, at 2:19 PM, Ahmed Abdeen Hamed wrote:
>>>>
>>>> Hello UIMA members,I am using the document analyzer example to analyze
>>>>
>>>>> large
>>>>> files from multiple dictionaries. One of the raw files is 7.5MB. The
>>>>> number
>>>>> of dictionaries is 13, 1MB is the size of each. Is there some sort of a
>>>>> matrix that you can use to predict the execution time? Has any one
>>>>> written
>>>>> a
>>>>> paper on the performance analysis of ConceptMapper?
>>>>> Please let me know if you can.
>>>>> Best wishes,
>>>>> --------------------------------------------------------
>>>>> Ahmed Abdeen Hamed
>>>>> Scientific Informatics Project Leader
>>>>> MBLWHOI Library
>>>>> Marine Biological Laboratory
>>>>> 7 MBL Street Woods Hole, MA 02543 USA
>>>>> +1 508 289 7676
>>>>> --
>>>>> email: abdeen@mbl.edu
>>>>> --
>>>>>
>>>>>
>>>>
>>
>

Re: ConceptMapper: Performance Matrix

Posted by Ahmed Abdeen Hamed <ah...@gmail.com>.
Great, I will combine the dictionaries then. It's also good to know that
compiled dictionaries makes a difference if it is a bottleneck. Have a good
vacation.Ahmed


On Mon, Jun 23, 2008 at 3:06 PM, Michael Tanenblatt <sl...@park-slope.net>
wrote:

> Yes, CompileDictionary.java will do it. But if dictionary loading time is
> not the problem, I wouldn't bother doing that as it will not buy you much.
> Combining the dictionaries, for now, should make the biggest difference.
>
>
> On Jun 23, 2008, at 3:02 PM, Ahmed Abdeen Hamed wrote:
>
>  Thanks Michael. Dictionaries processing time is reasonable. It's the
>> document analyzer execution time that is the bottleneck. I will merge the
>> dictionaries and compile them as you suggested. However, I am not sure
>> which
>> command line tool you are referring to. Do you mean:
>> org.apache.uima.conceptMapper.dictionaryCompiler.CompileDictionary.java?
>> Thanks for the vacation heads up.
>> Ahmed
>>
>> On Mon, Jun 23, 2008 at 2:37 PM, Michael Tanenblatt <
>> slothrop@park-slope.net>
>> wrote:
>>
>>  The short answer is "no". Not yet, anyway.
>>>
>>> But, here are some things that might help. First, if dictionary loading
>>> times are long, you can use the command line tool supplied in the package
>>> to
>>> compile the dictionary, and use the compiled dictionary. If you do this,
>>> remember that you will need to change the AE descriptors to use the
>>> correct
>>> implementation of the dictionary loader, e.g.:
>>>
>>> <externalResource>
>>>      ...
>>>
>>>
>>> <implementationName>org.apache.uima.conceptMapper.support.dictionaryResource.CompiledDictionaryResource_impl</implementationName>
>>>      ...
>>> </externalResource>
>>>
>>> That said, if you are using 13 dictionaries, that means you are running
>>> 13
>>> copies of ConceptMapper in your pipeline, which means that you are
>>> traversing each file's text  at 13 times just for your ConceptMapper
>>> invocations. If you could merge the dictionaries into one, you should see
>>> a
>>> marked speedup. Clearly, it a near-term enhancement of ConceptMapper
>>> would
>>> be to enable the loading of multiple dictionaries, which get merged at
>>> initialization time.
>>>
>>> One side note: I am going to be on vacation starting on June 25 and will
>>> only have occasional access to email until I return on July 12. I will
>>> try
>>> to answer questions during that time when I do have access, but I really
>>> have no idea how often that will be.
>>>
>>>
>>>
>>> On Jun 23, 2008, at 2:19 PM, Ahmed Abdeen Hamed wrote:
>>>
>>> Hello UIMA members,I am using the document analyzer example to analyze
>>>
>>>> large
>>>> files from multiple dictionaries. One of the raw files is 7.5MB. The
>>>> number
>>>> of dictionaries is 13, 1MB is the size of each. Is there some sort of a
>>>> matrix that you can use to predict the execution time? Has any one
>>>> written
>>>> a
>>>> paper on the performance analysis of ConceptMapper?
>>>> Please let me know if you can.
>>>> Best wishes,
>>>> --------------------------------------------------------
>>>> Ahmed Abdeen Hamed
>>>> Scientific Informatics Project Leader
>>>> MBLWHOI Library
>>>> Marine Biological Laboratory
>>>> 7 MBL Street Woods Hole, MA 02543 USA
>>>> +1 508 289 7676
>>>> --
>>>> email: abdeen@mbl.edu
>>>> --
>>>>
>>>>
>>>
>

Re: ConceptMapper: Performance Matrix

Posted by Michael Tanenblatt <sl...@park-slope.net>.
Yes, CompileDictionary.java will do it. But if dictionary loading time  
is not the problem, I wouldn't bother doing that as it will not buy  
you much. Combining the dictionaries, for now, should make the biggest  
difference.

On Jun 23, 2008, at 3:02 PM, Ahmed Abdeen Hamed wrote:

> Thanks Michael. Dictionaries processing time is reasonable. It's the
> document analyzer execution time that is the bottleneck. I will  
> merge the
> dictionaries and compile them as you suggested. However, I am not  
> sure which
> command line tool you are referring to. Do you mean:
> org 
> .apache.uima.conceptMapper.dictionaryCompiler.CompileDictionary.java?
> Thanks for the vacation heads up.
> Ahmed
>
> On Mon, Jun 23, 2008 at 2:37 PM, Michael Tanenblatt <slothrop@park-slope.net 
> >
> wrote:
>
>> The short answer is "no". Not yet, anyway.
>>
>> But, here are some things that might help. First, if dictionary  
>> loading
>> times are long, you can use the command line tool supplied in the  
>> package to
>> compile the dictionary, and use the compiled dictionary. If you do  
>> this,
>> remember that you will need to change the AE descriptors to use the  
>> correct
>> implementation of the dictionary loader, e.g.:
>>
>> <externalResource>
>>       ...
>>
>> < 
>> implementationName 
>> > 
>> org 
>> .apache 
>> .uima 
>> .conceptMapper 
>> .support.dictionaryResource.CompiledDictionaryResource_impl</ 
>> implementationName>
>>       ...
>> </externalResource>
>>
>> That said, if you are using 13 dictionaries, that means you are  
>> running 13
>> copies of ConceptMapper in your pipeline, which means that you are
>> traversing each file's text  at 13 times just for your ConceptMapper
>> invocations. If you could merge the dictionaries into one, you  
>> should see a
>> marked speedup. Clearly, it a near-term enhancement of  
>> ConceptMapper would
>> be to enable the loading of multiple dictionaries, which get merged  
>> at
>> initialization time.
>>
>> One side note: I am going to be on vacation starting on June 25 and  
>> will
>> only have occasional access to email until I return on July 12. I  
>> will try
>> to answer questions during that time when I do have access, but I  
>> really
>> have no idea how often that will be.
>>
>>
>>
>> On Jun 23, 2008, at 2:19 PM, Ahmed Abdeen Hamed wrote:
>>
>> Hello UIMA members,I am using the document analyzer example to  
>> analyze
>>> large
>>> files from multiple dictionaries. One of the raw files is 7.5MB. The
>>> number
>>> of dictionaries is 13, 1MB is the size of each. Is there some sort  
>>> of a
>>> matrix that you can use to predict the execution time? Has any one  
>>> written
>>> a
>>> paper on the performance analysis of ConceptMapper?
>>> Please let me know if you can.
>>> Best wishes,
>>> --------------------------------------------------------
>>> Ahmed Abdeen Hamed
>>> Scientific Informatics Project Leader
>>> MBLWHOI Library
>>> Marine Biological Laboratory
>>> 7 MBL Street Woods Hole, MA 02543 USA
>>> +1 508 289 7676
>>> --
>>> email: abdeen@mbl.edu
>>> --
>>>
>>


Re: ConceptMapper: Performance Matrix

Posted by Ahmed Abdeen Hamed <ah...@gmail.com>.
Thanks Michael. Dictionaries processing time is reasonable. It's the
document analyzer execution time that is the bottleneck. I will merge the
dictionaries and compile them as you suggested. However, I am not sure which
command line tool you are referring to. Do you mean:
org.apache.uima.conceptMapper.dictionaryCompiler.CompileDictionary.java?
Thanks for the vacation heads up.
Ahmed

On Mon, Jun 23, 2008 at 2:37 PM, Michael Tanenblatt <sl...@park-slope.net>
wrote:

> The short answer is "no". Not yet, anyway.
>
> But, here are some things that might help. First, if dictionary loading
> times are long, you can use the command line tool supplied in the package to
> compile the dictionary, and use the compiled dictionary. If you do this,
> remember that you will need to change the AE descriptors to use the correct
> implementation of the dictionary loader, e.g.:
>
> <externalResource>
>        ...
>
>  <implementationName>org.apache.uima.conceptMapper.support.dictionaryResource.CompiledDictionaryResource_impl</implementationName>
>        ...
> </externalResource>
>
> That said, if you are using 13 dictionaries, that means you are running 13
> copies of ConceptMapper in your pipeline, which means that you are
> traversing each file's text  at 13 times just for your ConceptMapper
> invocations. If you could merge the dictionaries into one, you should see a
> marked speedup. Clearly, it a near-term enhancement of ConceptMapper would
> be to enable the loading of multiple dictionaries, which get merged at
> initialization time.
>
> One side note: I am going to be on vacation starting on June 25 and will
> only have occasional access to email until I return on July 12. I will try
> to answer questions during that time when I do have access, but I really
> have no idea how often that will be.
>
>
>
> On Jun 23, 2008, at 2:19 PM, Ahmed Abdeen Hamed wrote:
>
>  Hello UIMA members,I am using the document analyzer example to analyze
>> large
>> files from multiple dictionaries. One of the raw files is 7.5MB. The
>> number
>> of dictionaries is 13, 1MB is the size of each. Is there some sort of a
>> matrix that you can use to predict the execution time? Has any one written
>> a
>> paper on the performance analysis of ConceptMapper?
>> Please let me know if you can.
>> Best wishes,
>> --------------------------------------------------------
>> Ahmed Abdeen Hamed
>> Scientific Informatics Project Leader
>> MBLWHOI Library
>> Marine Biological Laboratory
>> 7 MBL Street Woods Hole, MA 02543 USA
>> +1 508 289 7676
>> --
>> email: abdeen@mbl.edu
>> --
>>
>

Re: ConceptMapper: Performance Matrix

Posted by Michael Tanenblatt <sl...@park-slope.net>.
The short answer is "no". Not yet, anyway.

But, here are some things that might help. First, if dictionary  
loading times are long, you can use the command line tool supplied in  
the package to compile the dictionary, and use the compiled  
dictionary. If you do this, remember that you will need to change the  
AE descriptors to use the correct implementation of the dictionary  
loader, e.g.:

<externalResource>
	...
	 
< 
implementationName 
 > 
org 
.apache 
.uima 
.conceptMapper 
.support.dictionaryResource.CompiledDictionaryResource_impl</ 
implementationName>
	...
</externalResource>

That said, if you are using 13 dictionaries, that means you are  
running 13 copies of ConceptMapper in your pipeline, which means that  
you are traversing each file's text  at 13 times just for your  
ConceptMapper invocations. If you could merge the dictionaries into  
one, you should see a marked speedup. Clearly, it a near-term  
enhancement of ConceptMapper would be to enable the loading of  
multiple dictionaries, which get merged at initialization time.

One side note: I am going to be on vacation starting on June 25 and  
will only have occasional access to email until I return on July 12. I  
will try to answer questions during that time when I do have access,  
but I really have no idea how often that will be.


On Jun 23, 2008, at 2:19 PM, Ahmed Abdeen Hamed wrote:

> Hello UIMA members,I am using the document analyzer example to  
> analyze large
> files from multiple dictionaries. One of the raw files is 7.5MB. The  
> number
> of dictionaries is 13, 1MB is the size of each. Is there some sort  
> of a
> matrix that you can use to predict the execution time? Has any one  
> written a
> paper on the performance analysis of ConceptMapper?
> Please let me know if you can.
> Best wishes,
> --------------------------------------------------------
> Ahmed Abdeen Hamed
> Scientific Informatics Project Leader
> MBLWHOI Library
> Marine Biological Laboratory
> 7 MBL Street Woods Hole, MA 02543 USA
> +1 508 289 7676
> --
> email: abdeen@mbl.edu
> --