You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Mike Barborak (JIRA)" <de...@uima.apache.org> on 2013/06/20 21:58:19 UTC

[jira] [Created] (UIMA-3017) Getting feature value from feature structure longer than expected

Mike Barborak created UIMA-3017:
-----------------------------------

             Summary: Getting feature value from feature structure longer than expected
                 Key: UIMA-3017
                 URL: https://issues.apache.org/jira/browse/UIMA-3017
             Project: UIMA
          Issue Type: Improvement
          Components: Core Java Framework
    Affects Versions: 2.3
         Environment: Linux x86_64
            Reporter: Mike Barborak
            Priority: Minor


Should getting a value of a feature in a feature structure be fast? Intuitively, I would expect performance to be about the same as getting an entry from a Java HashMap or faster but in my experiments it seems to be 8 times slower. To solve my problem, I wrap my feature structures with caching Java code but it seems that there might be an opportunity to speed up UIMA generally.

My test creates a CAS with a single feature structure in it. It sets a string feature in that feature structure and then simply gets the value of that feature in a tight loop. I compare that to an instance of a Java class that has an internal HashMap of strings to strings. In that case, a method is called on that instance to get an entry from the map in a very tight loop. 

I do 5 rounds of each of the loops. The total times for the rounds involving the CAS were:

round 0 total time 1: 7.520104509s
round 1 total time 1: 6.812214938s
round 2 total time 1: 6.882752307s
round 3 total time 1: 6.728515004s
round 4 total time 1: 6.813674956s

The total times for the rounds just using the Java class were:

round 0 total time 2: 0.847296054s
round 1 total time 2: 0.814570347s
round 2 total time 2: 0.814399859s
round 3 total time 2: 0.814189383s
round 4 total time 2: 0.814979357s

Here is my Java code:

{code:title=MyTest.java}
package test;

import java.io.InputStream;
import java.util.HashMap;
import java.util.Map;

import org.apache.uima.UIMAFramework;
import org.apache.uima.cas.CAS;
import org.apache.uima.cas.Feature;
import org.apache.uima.cas.FeatureStructure;
import org.apache.uima.cas.Type;
import org.apache.uima.resource.metadata.TypeSystemDescription;
import org.apache.uima.util.CasCreationUtils;
import org.apache.uima.util.XMLInputSource;

public class MyTest {
  
  static class MyClass {
    Map<String, String> myFeatures = new HashMap<String, String>();
    
    void setStringValue(String feature, String value) {
      myFeatures.put(feature, value);
    }
    
    String getStringValue(String feature) {
      return myFeatures.get(feature);
    }
  }
  
  static public void main(String[] argv) throws Exception {
    InputStream stream = TestSupport.class.getClassLoader().getResourceAsStream("MyTypes.xml");
    TypeSystemDescription typeSystemDescription = UIMAFramework.getXMLParser().parseTypeSystemDescription(new XMLInputSource(stream, null));
    CAS cas = CasCreationUtils.createCas(typeSystemDescription, null, null);
    Type myType = cas.getTypeSystem().getType("MyType");
    FeatureStructure fs = cas.createFS(myType);
    Feature myFeature = myType.getFeatureByBaseName("myFeature");
    fs.setStringValue(myFeature, "myString");
    cas.addFsToIndexes(fs);
    
    MyClass myInstance = new MyClass();
    myInstance.setStringValue("myFeature2", "myString2");
    
    long iterations = 100000000;
    double nanoSecsPerSec = 1000000000.0d;
    
    for (int round = 0; round < 5; round++) {
      long start = System.nanoTime();
      for (long i = 0; i < iterations; i++) {
        fs.getStringValue(myFeature);
      }
      long end = System.nanoTime();
      System.out.println("round " + round + " total time 1: " + ((end - start) / nanoSecsPerSec) + "s");
    }
      
    for (int round = 0; round < 5; round++) {
      long start = System.nanoTime();
      for (long i = 0; i < iterations; i++) {
        myInstance.getStringValue("myFeature2");
      }
      long end = System.nanoTime();
      System.out.println("round " + round + " total time 2: " + ((end - start) / nanoSecsPerSec) + "s");
    }
  }

}
{code}

Here is my type descriptor:

{code:xml}
<?xml version="1.0" encoding="UTF-8"?>
<typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier">
  <name>MyTypes</name>
  <description/>
  <version>1.0</version>
  <vendor/>
  <types>
    <typeDescription>
      <name>MyType</name>
      <description/>
      <supertypeName>uima.cas.TOP</supertypeName>
      <features>
        <featureDescription>
          <name>myFeature</name>
          <description></description>
          <rangeTypeName>uima.cas.String</rangeTypeName>
        </featureDescription>
      </features>
    </typeDescription>
  </types>
</typeSystemDescription>
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira