You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Marshall Schor (JIRA)" <ui...@incubator.apache.org> on 2008/06/23 21:55:45 UTC

[jira] Created: (UIMA-1089) Space/Time tradeoffs in the CAS

Space/Time tradeoffs in the CAS
-------------------------------

                 Key: UIMA-1089
                 URL: https://issues.apache.org/jira/browse/UIMA-1089
             Project: UIMA
          Issue Type: Improvement
          Components: Core Java Framework
    Affects Versions: 2.2.2
            Reporter: Marshall Schor
            Priority: Minor


Investigate / implement optimizations that trade user-controllable time (running the optimizations) for space.  One such optimization could be: sharing strings.  To do the sharing requires additional computation and (temporary) storage to detect the sharing opportunities, but results in space savings.  For instance, a common annotation might assign short strings like "noun" to a "part-of-speech" feature.  If you are processing a large document, there may be a large number of these kinds of string valued features, picked from a small pool of allowable values. The CAS's string storage might be able to be optimized to share the string references in this case, at a cost of temporarily creating a hash table of the unique strings and using it to identify sharing possibilities.  A new API call to do this optimization would isolate the performance/space overhead of doing this optimization to just those users and times where it makes sense to do this.

An alternative would be to automatically figure this out for some selected kinds of optimizations, but I'm not sure that could be done without impacting finely-tuned systems negatively.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Created: (UIMA-1089) Space/Time tradeoffs in the CAS

Posted by Marshall Schor <ms...@schor.com>.
Thilo Goetz wrote:
> Marshall Schor (JIRA) wrote:
>> Space/Time tradeoffs in the CAS
>> -------------------------------
>>
>>                  Key: UIMA-1089
>>                  URL: https://issues.apache.org/jira/browse/UIMA-1089
>>              Project: UIMA
>>           Issue Type: Improvement
>>           Components: Core Java Framework
>>     Affects Versions: 2.2.2
>>             Reporter: Marshall Schor
>>             Priority: Minor
>>
>>
>> Investigate / implement optimizations that trade user-controllable 
>> time (running the optimizations) for space.  One such optimization 
>> could be: sharing strings.  To do the sharing requires additional 
>> computation and (temporary) storage to detect the sharing 
>> opportunities, but results in space savings.  For instance, a common 
>> annotation might assign short strings like "noun" to a 
>> "part-of-speech" feature.  If you are processing a large document, 
>> there may be a large number of these kinds of string valued features, 
>> picked from a small pool of allowable values. The CAS's string 
>> storage might be able to be optimized to share the string references 
>> in this case, at a cost of temporarily creating a hash table of the 
>> unique strings and using it to identify sharing possibilities.  A new 
>> API call to do this optimization would isolate the performance/space 
>> overhead of doing this optimization to just those users and times 
>> where it makes sense to do this.
>>
>> An alternative would be to automatically figure this out for some 
>> selected kinds of optimizations, but I'm not sure that could be done 
>> without impacting finely-tuned systems negatively.
>>
>
> Marshall,
>
> I'm not sure what you're doing here.  Why don't you just
> start discussion threads on the mailing list?  Why do these
> things need to be in Jira?
I thought the reason to put these in Jira was to "track" them so they 
don't get lost.  It seemed like a good idea to me.  The discussion can 
take place as Jira comments, and later can be easily located.  I don't 
have a strong preference, though. 
-Marshall

Re: [jira] Created: (UIMA-1089) Space/Time tradeoffs in the CAS

Posted by Thilo Goetz <tw...@gmx.de>.
Marshall Schor (JIRA) wrote:
> Space/Time tradeoffs in the CAS
> -------------------------------
> 
>                  Key: UIMA-1089
>                  URL: https://issues.apache.org/jira/browse/UIMA-1089
>              Project: UIMA
>           Issue Type: Improvement
>           Components: Core Java Framework
>     Affects Versions: 2.2.2
>             Reporter: Marshall Schor
>             Priority: Minor
> 
> 
> Investigate / implement optimizations that trade user-controllable time (running the optimizations) for space.  One such optimization could be: sharing strings.  To do the sharing requires additional computation and (temporary) storage to detect the sharing opportunities, but results in space savings.  For instance, a common annotation might assign short strings like "noun" to a "part-of-speech" feature.  If you are processing a large document, there may be a large number of these kinds of string valued features, picked from a small pool of allowable values. The CAS's string storage might be able to be optimized to share the string references in this case, at a cost of temporarily creating a hash table of the unique strings and using it to identify sharing possibilities.  A new API call to do this optimization would isolate the performance/space overhead of doing this optimization to just those users and times where it makes sense to do this.
> 
> An alternative would be to automatically figure this out for some selected kinds of optimizations, but I'm not sure that could be done without impacting finely-tuned systems negatively.
> 

Marshall,

I'm not sure what you're doing here.  Why don't you just
start discussion threads on the mailing list?  Why do these
things need to be in Jira?

--Thilo


[jira] Updated: (UIMA-1089) Space/Time tradeoffs in the CAS

Posted by "Marshall Schor (JIRA)" <ui...@incubator.apache.org>.
     [ https://issues.apache.org/jira/browse/UIMA-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marshall Schor updated UIMA-1089:
---------------------------------

    Affects Version/s: 2.3

defer beyond 2.3.0

> Space/Time tradeoffs in the CAS
> -------------------------------
>
>                 Key: UIMA-1089
>                 URL: https://issues.apache.org/jira/browse/UIMA-1089
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.2.2, 2.3
>            Reporter: Marshall Schor
>            Priority: Minor
>
> Investigate / implement optimizations that trade user-controllable time (running the optimizations) for space.  One such optimization could be: sharing strings.  To do the sharing requires additional computation and (temporary) storage to detect the sharing opportunities, but results in space savings.  For instance, a common annotation might assign short strings like "noun" to a "part-of-speech" feature.  If you are processing a large document, there may be a large number of these kinds of string valued features, picked from a small pool of allowable values. The CAS's string storage might be able to be optimized to share the string references in this case, at a cost of temporarily creating a hash table of the unique strings and using it to identify sharing possibilities.  A new API call to do this optimization would isolate the performance/space overhead of doing this optimization to just those users and times where it makes sense to do this.
> An alternative would be to automatically figure this out for some selected kinds of optimizations, but I'm not sure that could be done without impacting finely-tuned systems negatively.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.