You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@opennlp.apache.org by "Assaf Urieli (JIRA)" <ji...@apache.org> on 2011/05/12 11:30:47 UTC

[jira] [Created] (OPENNLP-170) OpenNLP Maxent miscalculates for real values < 1

OpenNLP Maxent miscalculates for real values < 1
------------------------------------------------

                 Key: OPENNLP-170
                 URL: https://issues.apache.org/jira/browse/OPENNLP-170
             Project: OpenNLP
          Issue Type: Bug
          Components: Maxent
    Affects Versions: maxent-3.0.0-sourceforge
         Environment: Windows 7, Java 1.6
            Reporter: Assaf Urieli


When using predicates with real values, entering real values predA=0.1 predB=0.2 gives different results than predA=10, predB=20
However, using predA=1, predB=2 gives the same results as predA=10, predB=20.

Test below:
package openMaxentTest;

import java.io.StringReader;
import junit.framework.TestCase;

import opennlp.maxent.GIS;
import opennlp.maxent.PlainTextByLineDataStream;
import opennlp.maxent.RealBasicEventStream;
import opennlp.model.EventStream;
import opennlp.model.MaxentModel;
import opennlp.model.OnePassRealValueDataIndexer;
import opennlp.model.RealValueFileEventStream;


public class ScaleDoesntMatterTest extends TestCase {

	/**
	 * This test sets out to prove that the scale you use on real valued predicates
	 * doesn't matter when it comes the probability assigned to each outcome.
	 * Strangely, if we use (1,2) and (10,20) there's no difference.
	 * If we use (0.1,0.2) and (10,20) there is a difference.
	 * @throws Exception
	 */
	public void testScaleResults() throws Exception {
		String smallValues = "predA=0.1 predB=0.2 A\n" +
				"predB=0.3 predA=0.1 B\n";
		
		String smallTest = "predA=0.2 predB=0.2";
		
		String largeValues = "predA=10 predB=20 A\n" +
				"predB=30 predA=10 B\n";
		
		String largeTest = "predA=20 predB=20";
		
		StringReader smallReader = new StringReader(smallValues);
		EventStream smallEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(smallReader));

		MaxentModel smallModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(smallEventStream,0), false);
		String[] contexts = smallTest.split(" ");
		float[] values = RealValueFileEventStream.parseContexts(contexts);
		double[] ocs = smallModel.eval(contexts, values);
		
		String smallResults = smallModel.getAllOutcomes(ocs);
		System.out.println("smallResults: " + smallResults);
		
		StringReader largeReader = new StringReader(largeValues);
		EventStream largeEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(largeReader));

		MaxentModel largeModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(largeEventStream,0), false);
		contexts = largeTest.split(" ");
		values = RealValueFileEventStream.parseContexts(contexts);
		ocs = largeModel.eval(contexts, values);
		
		String largeResults = smallModel.getAllOutcomes(ocs);
		System.out.println("largeResults: " + largeResults);
		
		assertEquals(smallResults, largeResults);
		
	}
}

The problem concerns the correctionConstant in GISTrainer, which is set to be an integer. I implemented the following fix in class GISTrainer:
    // determine the correction constant and its inverse
    //int correctionConstant = 1;
    float correctionConstant = 0;
    for (int ci = 0; ci < contexts.length; ci++) {
      if (values == null || values[ci] == null) {
        if (contexts[ci].length > correctionConstant) {
          correctionConstant = contexts[ci].length;
        }
      }
      else {
        float cl = values[ci][0];
        for (int vi=1;vi<values[ci].length;vi++) {
          cl+=values[ci][vi];
        }
        
        if (cl > correctionConstant) {
          //correctionConstant=(int) Math.ceil(cl);
          correctionConstant= cl;
        }
      }
    }

I'd be curious to know if there's a reason for using an integer correctionConstant.

Rgds,
Assaf Urieli

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-170) OpenNLP Maxent miscalculates for real values < 1

Posted by "Assaf Urieli (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053777#comment-13053777 ] 

Assaf Urieli commented on OPENNLP-170:
--------------------------------------

I've created a branch that resolves this issue for now on GitHub, at :
https://github.com/urieli/OpenNLP-Maxent-Joliciel

The commit listing the changes (+ tests to prove they work) is at:
https://github.com/urieli/OpenNLP-Maxent-Joliciel/commit/3c7d2b1563443110b3f4e83d32acb2191efe8866

Rgds,
Assaf Urieli
PhD student, Natural Language Processing
Université de Toulouse le Mirail
http://www.joli-ciel.com


> OpenNLP Maxent miscalculates for real values < 1
> ------------------------------------------------
>
>                 Key: OPENNLP-170
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-170
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Maxent
>    Affects Versions: maxent-3.0.0-sourceforge
>         Environment: Windows 7, Java 1.6
>            Reporter: Assaf Urieli
>            Assignee: Jason Baldridge
>             Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>
> When using predicates with real values, entering real values predA=0.1 predB=0.2 gives different results than predA=10, predB=20
> However, using predA=1, predB=2 gives the same results as predA=10, predB=20.
> Test below:
> package openMaxentTest;
> import java.io.StringReader;
> import junit.framework.TestCase;
> import opennlp.maxent.GIS;
> import opennlp.maxent.PlainTextByLineDataStream;
> import opennlp.maxent.RealBasicEventStream;
> import opennlp.model.EventStream;
> import opennlp.model.MaxentModel;
> import opennlp.model.OnePassRealValueDataIndexer;
> import opennlp.model.RealValueFileEventStream;
> public class ScaleDoesntMatterTest extends TestCase {
> 	/**
> 	 * This test sets out to prove that the scale you use on real valued predicates
> 	 * doesn't matter when it comes the probability assigned to each outcome.
> 	 * Strangely, if we use (1,2) and (10,20) there's no difference.
> 	 * If we use (0.1,0.2) and (10,20) there is a difference.
> 	 * @throws Exception
> 	 */
> 	public void testScaleResults() throws Exception {
> 		String smallValues = "predA=0.1 predB=0.2 A\n" +
> 				"predB=0.3 predA=0.1 B\n";
> 		
> 		String smallTest = "predA=0.2 predB=0.2";
> 		
> 		String largeValues = "predA=10 predB=20 A\n" +
> 				"predB=30 predA=10 B\n";
> 		
> 		String largeTest = "predA=20 predB=20";
> 		
> 		StringReader smallReader = new StringReader(smallValues);
> 		EventStream smallEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(smallReader));
> 		MaxentModel smallModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(smallEventStream,0), false);
> 		String[] contexts = smallTest.split(" ");
> 		float[] values = RealValueFileEventStream.parseContexts(contexts);
> 		double[] ocs = smallModel.eval(contexts, values);
> 		
> 		String smallResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("smallResults: " + smallResults);
> 		
> 		StringReader largeReader = new StringReader(largeValues);
> 		EventStream largeEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(largeReader));
> 		MaxentModel largeModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(largeEventStream,0), false);
> 		contexts = largeTest.split(" ");
> 		values = RealValueFileEventStream.parseContexts(contexts);
> 		ocs = largeModel.eval(contexts, values);
> 		
> 		String largeResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("largeResults: " + largeResults);
> 		
> 		assertEquals(smallResults, largeResults);
> 		
> 	}
> }
> The problem concerns the correctionConstant in GISTrainer, which is set to be an integer. I implemented the following fix in class GISTrainer:
>     // determine the correction constant and its inverse
>     //int correctionConstant = 1;
>     float correctionConstant = 0;
>     for (int ci = 0; ci < contexts.length; ci++) {
>       if (values == null || values[ci] == null) {
>         if (contexts[ci].length > correctionConstant) {
>           correctionConstant = contexts[ci].length;
>         }
>       }
>       else {
>         float cl = values[ci][0];
>         for (int vi=1;vi<values[ci].length;vi++) {
>           cl+=values[ci][vi];
>         }
>         
>         if (cl > correctionConstant) {
>           //correctionConstant=(int) Math.ceil(cl);
>           correctionConstant= cl;
>         }
>       }
>     }
> I'd be curious to know if there's a reason for using an integer correctionConstant.
> Rgds,
> Assaf Urieli

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (OPENNLP-170) OpenNLP Maxent miscalculates for real values < 1

Posted by "Joern Kottmann (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/OPENNLP-170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Kottmann closed OPENNLP-170.
----------------------------------

    Resolution: Fixed

> OpenNLP Maxent miscalculates for real values < 1
> ------------------------------------------------
>
>                 Key: OPENNLP-170
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-170
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Maxent
>    Affects Versions: maxent-3.0.0-sourceforge
>         Environment: Windows 7, Java 1.6
>            Reporter: Assaf Urieli
>            Assignee: Joern Kottmann
>             Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>         Attachments: GISTrainer.java, GISTrainerChangeLog.txt, ScaleDoesntMatterTest.java
>
>
> When using predicates with real values, entering real values predA=0.1 predB=0.2 gives different results than predA=10, predB=20
> However, using predA=1, predB=2 gives the same results as predA=10, predB=20.
> Test below:
> package openMaxentTest;
> import java.io.StringReader;
> import junit.framework.TestCase;
> import opennlp.maxent.GIS;
> import opennlp.maxent.PlainTextByLineDataStream;
> import opennlp.maxent.RealBasicEventStream;
> import opennlp.model.EventStream;
> import opennlp.model.MaxentModel;
> import opennlp.model.OnePassRealValueDataIndexer;
> import opennlp.model.RealValueFileEventStream;
> public class ScaleDoesntMatterTest extends TestCase {
> 	/**
> 	 * This test sets out to prove that the scale you use on real valued predicates
> 	 * doesn't matter when it comes the probability assigned to each outcome.
> 	 * Strangely, if we use (1,2) and (10,20) there's no difference.
> 	 * If we use (0.1,0.2) and (10,20) there is a difference.
> 	 * @throws Exception
> 	 */
> 	public void testScaleResults() throws Exception {
> 		String smallValues = "predA=0.1 predB=0.2 A\n" +
> 				"predB=0.3 predA=0.1 B\n";
> 		
> 		String smallTest = "predA=0.2 predB=0.2";
> 		
> 		String largeValues = "predA=10 predB=20 A\n" +
> 				"predB=30 predA=10 B\n";
> 		
> 		String largeTest = "predA=20 predB=20";
> 		
> 		StringReader smallReader = new StringReader(smallValues);
> 		EventStream smallEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(smallReader));
> 		MaxentModel smallModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(smallEventStream,0), false);
> 		String[] contexts = smallTest.split(" ");
> 		float[] values = RealValueFileEventStream.parseContexts(contexts);
> 		double[] ocs = smallModel.eval(contexts, values);
> 		
> 		String smallResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("smallResults: " + smallResults);
> 		
> 		StringReader largeReader = new StringReader(largeValues);
> 		EventStream largeEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(largeReader));
> 		MaxentModel largeModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(largeEventStream,0), false);
> 		contexts = largeTest.split(" ");
> 		values = RealValueFileEventStream.parseContexts(contexts);
> 		ocs = largeModel.eval(contexts, values);
> 		
> 		String largeResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("largeResults: " + largeResults);
> 		
> 		assertEquals(smallResults, largeResults);
> 		
> 	}
> }
> The problem concerns the correctionConstant in GISTrainer, which is set to be an integer. I implemented the following fix in class GISTrainer:
>     // determine the correction constant and its inverse
>     //int correctionConstant = 1;
>     float correctionConstant = 0;
>     for (int ci = 0; ci < contexts.length; ci++) {
>       if (values == null || values[ci] == null) {
>         if (contexts[ci].length > correctionConstant) {
>           correctionConstant = contexts[ci].length;
>         }
>       }
>       else {
>         float cl = values[ci][0];
>         for (int vi=1;vi<values[ci].length;vi++) {
>           cl+=values[ci][vi];
>         }
>         
>         if (cl > correctionConstant) {
>           //correctionConstant=(int) Math.ceil(cl);
>           correctionConstant= cl;
>         }
>       }
>     }
> I'd be curious to know if there's a reason for using an integer correctionConstant.
> Rgds,
> Assaf Urieli

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-170) OpenNLP Maxent miscalculates for real values < 1

Posted by "Jason Baldridge (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043889#comment-13043889 ] 

Jason Baldridge commented on OPENNLP-170:
-----------------------------------------

Sorry for the delay. Not sure when I'll be able to get to this.

2011/5/24 Jörn Kottmann (JIRA) <ji...@apache.org>




-- 
Jason Baldridge
Assistant Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge


> OpenNLP Maxent miscalculates for real values < 1
> ------------------------------------------------
>
>                 Key: OPENNLP-170
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-170
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Maxent
>    Affects Versions: maxent-3.0.0-sourceforge
>         Environment: Windows 7, Java 1.6
>            Reporter: Assaf Urieli
>            Assignee: Jason Baldridge
>             Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>
> When using predicates with real values, entering real values predA=0.1 predB=0.2 gives different results than predA=10, predB=20
> However, using predA=1, predB=2 gives the same results as predA=10, predB=20.
> Test below:
> package openMaxentTest;
> import java.io.StringReader;
> import junit.framework.TestCase;
> import opennlp.maxent.GIS;
> import opennlp.maxent.PlainTextByLineDataStream;
> import opennlp.maxent.RealBasicEventStream;
> import opennlp.model.EventStream;
> import opennlp.model.MaxentModel;
> import opennlp.model.OnePassRealValueDataIndexer;
> import opennlp.model.RealValueFileEventStream;
> public class ScaleDoesntMatterTest extends TestCase {
> 	/**
> 	 * This test sets out to prove that the scale you use on real valued predicates
> 	 * doesn't matter when it comes the probability assigned to each outcome.
> 	 * Strangely, if we use (1,2) and (10,20) there's no difference.
> 	 * If we use (0.1,0.2) and (10,20) there is a difference.
> 	 * @throws Exception
> 	 */
> 	public void testScaleResults() throws Exception {
> 		String smallValues = "predA=0.1 predB=0.2 A\n" +
> 				"predB=0.3 predA=0.1 B\n";
> 		
> 		String smallTest = "predA=0.2 predB=0.2";
> 		
> 		String largeValues = "predA=10 predB=20 A\n" +
> 				"predB=30 predA=10 B\n";
> 		
> 		String largeTest = "predA=20 predB=20";
> 		
> 		StringReader smallReader = new StringReader(smallValues);
> 		EventStream smallEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(smallReader));
> 		MaxentModel smallModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(smallEventStream,0), false);
> 		String[] contexts = smallTest.split(" ");
> 		float[] values = RealValueFileEventStream.parseContexts(contexts);
> 		double[] ocs = smallModel.eval(contexts, values);
> 		
> 		String smallResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("smallResults: " + smallResults);
> 		
> 		StringReader largeReader = new StringReader(largeValues);
> 		EventStream largeEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(largeReader));
> 		MaxentModel largeModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(largeEventStream,0), false);
> 		contexts = largeTest.split(" ");
> 		values = RealValueFileEventStream.parseContexts(contexts);
> 		ocs = largeModel.eval(contexts, values);
> 		
> 		String largeResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("largeResults: " + largeResults);
> 		
> 		assertEquals(smallResults, largeResults);
> 		
> 	}
> }
> The problem concerns the correctionConstant in GISTrainer, which is set to be an integer. I implemented the following fix in class GISTrainer:
>     // determine the correction constant and its inverse
>     //int correctionConstant = 1;
>     float correctionConstant = 0;
>     for (int ci = 0; ci < contexts.length; ci++) {
>       if (values == null || values[ci] == null) {
>         if (contexts[ci].length > correctionConstant) {
>           correctionConstant = contexts[ci].length;
>         }
>       }
>       else {
>         float cl = values[ci][0];
>         for (int vi=1;vi<values[ci].length;vi++) {
>           cl+=values[ci][vi];
>         }
>         
>         if (cl > correctionConstant) {
>           //correctionConstant=(int) Math.ceil(cl);
>           correctionConstant= cl;
>         }
>       }
>     }
> I'd be curious to know if there's a reason for using an integer correctionConstant.
> Rgds,
> Assaf Urieli

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-170) OpenNLP Maxent miscalculates for real values < 1

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038546#comment-13038546 ] 

Jörn Kottmann commented on OPENNLP-170:
---------------------------------------

Jason, can you please have a look at this issue?

> OpenNLP Maxent miscalculates for real values < 1
> ------------------------------------------------
>
>                 Key: OPENNLP-170
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-170
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Maxent
>    Affects Versions: maxent-3.0.0-sourceforge
>         Environment: Windows 7, Java 1.6
>            Reporter: Assaf Urieli
>            Assignee: Jason Baldridge
>             Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>
> When using predicates with real values, entering real values predA=0.1 predB=0.2 gives different results than predA=10, predB=20
> However, using predA=1, predB=2 gives the same results as predA=10, predB=20.
> Test below:
> package openMaxentTest;
> import java.io.StringReader;
> import junit.framework.TestCase;
> import opennlp.maxent.GIS;
> import opennlp.maxent.PlainTextByLineDataStream;
> import opennlp.maxent.RealBasicEventStream;
> import opennlp.model.EventStream;
> import opennlp.model.MaxentModel;
> import opennlp.model.OnePassRealValueDataIndexer;
> import opennlp.model.RealValueFileEventStream;
> public class ScaleDoesntMatterTest extends TestCase {
> 	/**
> 	 * This test sets out to prove that the scale you use on real valued predicates
> 	 * doesn't matter when it comes the probability assigned to each outcome.
> 	 * Strangely, if we use (1,2) and (10,20) there's no difference.
> 	 * If we use (0.1,0.2) and (10,20) there is a difference.
> 	 * @throws Exception
> 	 */
> 	public void testScaleResults() throws Exception {
> 		String smallValues = "predA=0.1 predB=0.2 A\n" +
> 				"predB=0.3 predA=0.1 B\n";
> 		
> 		String smallTest = "predA=0.2 predB=0.2";
> 		
> 		String largeValues = "predA=10 predB=20 A\n" +
> 				"predB=30 predA=10 B\n";
> 		
> 		String largeTest = "predA=20 predB=20";
> 		
> 		StringReader smallReader = new StringReader(smallValues);
> 		EventStream smallEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(smallReader));
> 		MaxentModel smallModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(smallEventStream,0), false);
> 		String[] contexts = smallTest.split(" ");
> 		float[] values = RealValueFileEventStream.parseContexts(contexts);
> 		double[] ocs = smallModel.eval(contexts, values);
> 		
> 		String smallResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("smallResults: " + smallResults);
> 		
> 		StringReader largeReader = new StringReader(largeValues);
> 		EventStream largeEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(largeReader));
> 		MaxentModel largeModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(largeEventStream,0), false);
> 		contexts = largeTest.split(" ");
> 		values = RealValueFileEventStream.parseContexts(contexts);
> 		ocs = largeModel.eval(contexts, values);
> 		
> 		String largeResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("largeResults: " + largeResults);
> 		
> 		assertEquals(smallResults, largeResults);
> 		
> 	}
> }
> The problem concerns the correctionConstant in GISTrainer, which is set to be an integer. I implemented the following fix in class GISTrainer:
>     // determine the correction constant and its inverse
>     //int correctionConstant = 1;
>     float correctionConstant = 0;
>     for (int ci = 0; ci < contexts.length; ci++) {
>       if (values == null || values[ci] == null) {
>         if (contexts[ci].length > correctionConstant) {
>           correctionConstant = contexts[ci].length;
>         }
>       }
>       else {
>         float cl = values[ci][0];
>         for (int vi=1;vi<values[ci].length;vi++) {
>           cl+=values[ci][vi];
>         }
>         
>         if (cl > correctionConstant) {
>           //correctionConstant=(int) Math.ceil(cl);
>           correctionConstant= cl;
>         }
>       }
>     }
> I'd be curious to know if there's a reason for using an integer correctionConstant.
> Rgds,
> Assaf Urieli

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-170) OpenNLP Maxent miscalculates for real values < 1

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053785#comment-13053785 ] 

Jörn Kottmann commented on OPENNLP-170:
---------------------------------------

Would you mind to attach a patch to this issue? I will then help testing it on my data sets.

> OpenNLP Maxent miscalculates for real values < 1
> ------------------------------------------------
>
>                 Key: OPENNLP-170
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-170
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Maxent
>    Affects Versions: maxent-3.0.0-sourceforge
>         Environment: Windows 7, Java 1.6
>            Reporter: Assaf Urieli
>            Assignee: Jason Baldridge
>             Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>
> When using predicates with real values, entering real values predA=0.1 predB=0.2 gives different results than predA=10, predB=20
> However, using predA=1, predB=2 gives the same results as predA=10, predB=20.
> Test below:
> package openMaxentTest;
> import java.io.StringReader;
> import junit.framework.TestCase;
> import opennlp.maxent.GIS;
> import opennlp.maxent.PlainTextByLineDataStream;
> import opennlp.maxent.RealBasicEventStream;
> import opennlp.model.EventStream;
> import opennlp.model.MaxentModel;
> import opennlp.model.OnePassRealValueDataIndexer;
> import opennlp.model.RealValueFileEventStream;
> public class ScaleDoesntMatterTest extends TestCase {
> 	/**
> 	 * This test sets out to prove that the scale you use on real valued predicates
> 	 * doesn't matter when it comes the probability assigned to each outcome.
> 	 * Strangely, if we use (1,2) and (10,20) there's no difference.
> 	 * If we use (0.1,0.2) and (10,20) there is a difference.
> 	 * @throws Exception
> 	 */
> 	public void testScaleResults() throws Exception {
> 		String smallValues = "predA=0.1 predB=0.2 A\n" +
> 				"predB=0.3 predA=0.1 B\n";
> 		
> 		String smallTest = "predA=0.2 predB=0.2";
> 		
> 		String largeValues = "predA=10 predB=20 A\n" +
> 				"predB=30 predA=10 B\n";
> 		
> 		String largeTest = "predA=20 predB=20";
> 		
> 		StringReader smallReader = new StringReader(smallValues);
> 		EventStream smallEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(smallReader));
> 		MaxentModel smallModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(smallEventStream,0), false);
> 		String[] contexts = smallTest.split(" ");
> 		float[] values = RealValueFileEventStream.parseContexts(contexts);
> 		double[] ocs = smallModel.eval(contexts, values);
> 		
> 		String smallResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("smallResults: " + smallResults);
> 		
> 		StringReader largeReader = new StringReader(largeValues);
> 		EventStream largeEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(largeReader));
> 		MaxentModel largeModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(largeEventStream,0), false);
> 		contexts = largeTest.split(" ");
> 		values = RealValueFileEventStream.parseContexts(contexts);
> 		ocs = largeModel.eval(contexts, values);
> 		
> 		String largeResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("largeResults: " + largeResults);
> 		
> 		assertEquals(smallResults, largeResults);
> 		
> 	}
> }
> The problem concerns the correctionConstant in GISTrainer, which is set to be an integer. I implemented the following fix in class GISTrainer:
>     // determine the correction constant and its inverse
>     //int correctionConstant = 1;
>     float correctionConstant = 0;
>     for (int ci = 0; ci < contexts.length; ci++) {
>       if (values == null || values[ci] == null) {
>         if (contexts[ci].length > correctionConstant) {
>           correctionConstant = contexts[ci].length;
>         }
>       }
>       else {
>         float cl = values[ci][0];
>         for (int vi=1;vi<values[ci].length;vi++) {
>           cl+=values[ci][vi];
>         }
>         
>         if (cl > correctionConstant) {
>           //correctionConstant=(int) Math.ceil(cl);
>           correctionConstant= cl;
>         }
>       }
>     }
> I'd be curious to know if there's a reason for using an integer correctionConstant.
> Rgds,
> Assaf Urieli

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (OPENNLP-170) OpenNLP Maxent miscalculates for real values < 1

Posted by "Joern Kottmann (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/OPENNLP-170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Kottmann closed OPENNLP-170.
----------------------------------

    Resolution: Fixed
      Assignee: Joern Kottmann  (was: Jason Baldridge)

Changed as proposed, thanks,

> OpenNLP Maxent miscalculates for real values < 1
> ------------------------------------------------
>
>                 Key: OPENNLP-170
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-170
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Maxent
>    Affects Versions: maxent-3.0.0-sourceforge
>         Environment: Windows 7, Java 1.6
>            Reporter: Assaf Urieli
>            Assignee: Joern Kottmann
>             Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>         Attachments: GISTrainer.java, GISTrainerChangeLog.txt, ScaleDoesntMatterTest.java
>
>
> When using predicates with real values, entering real values predA=0.1 predB=0.2 gives different results than predA=10, predB=20
> However, using predA=1, predB=2 gives the same results as predA=10, predB=20.
> Test below:
> package openMaxentTest;
> import java.io.StringReader;
> import junit.framework.TestCase;
> import opennlp.maxent.GIS;
> import opennlp.maxent.PlainTextByLineDataStream;
> import opennlp.maxent.RealBasicEventStream;
> import opennlp.model.EventStream;
> import opennlp.model.MaxentModel;
> import opennlp.model.OnePassRealValueDataIndexer;
> import opennlp.model.RealValueFileEventStream;
> public class ScaleDoesntMatterTest extends TestCase {
> 	/**
> 	 * This test sets out to prove that the scale you use on real valued predicates
> 	 * doesn't matter when it comes the probability assigned to each outcome.
> 	 * Strangely, if we use (1,2) and (10,20) there's no difference.
> 	 * If we use (0.1,0.2) and (10,20) there is a difference.
> 	 * @throws Exception
> 	 */
> 	public void testScaleResults() throws Exception {
> 		String smallValues = "predA=0.1 predB=0.2 A\n" +
> 				"predB=0.3 predA=0.1 B\n";
> 		
> 		String smallTest = "predA=0.2 predB=0.2";
> 		
> 		String largeValues = "predA=10 predB=20 A\n" +
> 				"predB=30 predA=10 B\n";
> 		
> 		String largeTest = "predA=20 predB=20";
> 		
> 		StringReader smallReader = new StringReader(smallValues);
> 		EventStream smallEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(smallReader));
> 		MaxentModel smallModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(smallEventStream,0), false);
> 		String[] contexts = smallTest.split(" ");
> 		float[] values = RealValueFileEventStream.parseContexts(contexts);
> 		double[] ocs = smallModel.eval(contexts, values);
> 		
> 		String smallResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("smallResults: " + smallResults);
> 		
> 		StringReader largeReader = new StringReader(largeValues);
> 		EventStream largeEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(largeReader));
> 		MaxentModel largeModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(largeEventStream,0), false);
> 		contexts = largeTest.split(" ");
> 		values = RealValueFileEventStream.parseContexts(contexts);
> 		ocs = largeModel.eval(contexts, values);
> 		
> 		String largeResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("largeResults: " + largeResults);
> 		
> 		assertEquals(smallResults, largeResults);
> 		
> 	}
> }
> The problem concerns the correctionConstant in GISTrainer, which is set to be an integer. I implemented the following fix in class GISTrainer:
>     // determine the correction constant and its inverse
>     //int correctionConstant = 1;
>     float correctionConstant = 0;
>     for (int ci = 0; ci < contexts.length; ci++) {
>       if (values == null || values[ci] == null) {
>         if (contexts[ci].length > correctionConstant) {
>           correctionConstant = contexts[ci].length;
>         }
>       }
>       else {
>         float cl = values[ci][0];
>         for (int vi=1;vi<values[ci].length;vi++) {
>           cl+=values[ci][vi];
>         }
>         
>         if (cl > correctionConstant) {
>           //correctionConstant=(int) Math.ceil(cl);
>           correctionConstant= cl;
>         }
>       }
>     }
> I'd be curious to know if there's a reason for using an integer correctionConstant.
> Rgds,
> Assaf Urieli

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (OPENNLP-170) OpenNLP Maxent miscalculates for real values < 1

Posted by "Joern Kottmann (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/OPENNLP-170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Kottmann reopened OPENNLP-170:
------------------------------------


I forgot to add the contributed test ...

> OpenNLP Maxent miscalculates for real values < 1
> ------------------------------------------------
>
>                 Key: OPENNLP-170
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-170
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Maxent
>    Affects Versions: maxent-3.0.0-sourceforge
>         Environment: Windows 7, Java 1.6
>            Reporter: Assaf Urieli
>            Assignee: Joern Kottmann
>             Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>         Attachments: GISTrainer.java, GISTrainerChangeLog.txt, ScaleDoesntMatterTest.java
>
>
> When using predicates with real values, entering real values predA=0.1 predB=0.2 gives different results than predA=10, predB=20
> However, using predA=1, predB=2 gives the same results as predA=10, predB=20.
> Test below:
> package openMaxentTest;
> import java.io.StringReader;
> import junit.framework.TestCase;
> import opennlp.maxent.GIS;
> import opennlp.maxent.PlainTextByLineDataStream;
> import opennlp.maxent.RealBasicEventStream;
> import opennlp.model.EventStream;
> import opennlp.model.MaxentModel;
> import opennlp.model.OnePassRealValueDataIndexer;
> import opennlp.model.RealValueFileEventStream;
> public class ScaleDoesntMatterTest extends TestCase {
> 	/**
> 	 * This test sets out to prove that the scale you use on real valued predicates
> 	 * doesn't matter when it comes the probability assigned to each outcome.
> 	 * Strangely, if we use (1,2) and (10,20) there's no difference.
> 	 * If we use (0.1,0.2) and (10,20) there is a difference.
> 	 * @throws Exception
> 	 */
> 	public void testScaleResults() throws Exception {
> 		String smallValues = "predA=0.1 predB=0.2 A\n" +
> 				"predB=0.3 predA=0.1 B\n";
> 		
> 		String smallTest = "predA=0.2 predB=0.2";
> 		
> 		String largeValues = "predA=10 predB=20 A\n" +
> 				"predB=30 predA=10 B\n";
> 		
> 		String largeTest = "predA=20 predB=20";
> 		
> 		StringReader smallReader = new StringReader(smallValues);
> 		EventStream smallEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(smallReader));
> 		MaxentModel smallModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(smallEventStream,0), false);
> 		String[] contexts = smallTest.split(" ");
> 		float[] values = RealValueFileEventStream.parseContexts(contexts);
> 		double[] ocs = smallModel.eval(contexts, values);
> 		
> 		String smallResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("smallResults: " + smallResults);
> 		
> 		StringReader largeReader = new StringReader(largeValues);
> 		EventStream largeEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(largeReader));
> 		MaxentModel largeModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(largeEventStream,0), false);
> 		contexts = largeTest.split(" ");
> 		values = RealValueFileEventStream.parseContexts(contexts);
> 		ocs = largeModel.eval(contexts, values);
> 		
> 		String largeResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("largeResults: " + largeResults);
> 		
> 		assertEquals(smallResults, largeResults);
> 		
> 	}
> }
> The problem concerns the correctionConstant in GISTrainer, which is set to be an integer. I implemented the following fix in class GISTrainer:
>     // determine the correction constant and its inverse
>     //int correctionConstant = 1;
>     float correctionConstant = 0;
>     for (int ci = 0; ci < contexts.length; ci++) {
>       if (values == null || values[ci] == null) {
>         if (contexts[ci].length > correctionConstant) {
>           correctionConstant = contexts[ci].length;
>         }
>       }
>       else {
>         float cl = values[ci][0];
>         for (int vi=1;vi<values[ci].length;vi++) {
>           cl+=values[ci][vi];
>         }
>         
>         if (cl > correctionConstant) {
>           //correctionConstant=(int) Math.ceil(cl);
>           correctionConstant= cl;
>         }
>       }
>     }
> I'd be curious to know if there's a reason for using an integer correctionConstant.
> Rgds,
> Assaf Urieli

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (OPENNLP-170) OpenNLP Maxent miscalculates for real values < 1

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/OPENNLP-170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jörn Kottmann updated OPENNLP-170:
----------------------------------

    Fix Version/s: maxent-3.0.2-incubating
                   tools-1.5.2-incubating
         Assignee: Jason Baldridge

> OpenNLP Maxent miscalculates for real values < 1
> ------------------------------------------------
>
>                 Key: OPENNLP-170
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-170
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Maxent
>    Affects Versions: maxent-3.0.0-sourceforge
>         Environment: Windows 7, Java 1.6
>            Reporter: Assaf Urieli
>            Assignee: Jason Baldridge
>             Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>
> When using predicates with real values, entering real values predA=0.1 predB=0.2 gives different results than predA=10, predB=20
> However, using predA=1, predB=2 gives the same results as predA=10, predB=20.
> Test below:
> package openMaxentTest;
> import java.io.StringReader;
> import junit.framework.TestCase;
> import opennlp.maxent.GIS;
> import opennlp.maxent.PlainTextByLineDataStream;
> import opennlp.maxent.RealBasicEventStream;
> import opennlp.model.EventStream;
> import opennlp.model.MaxentModel;
> import opennlp.model.OnePassRealValueDataIndexer;
> import opennlp.model.RealValueFileEventStream;
> public class ScaleDoesntMatterTest extends TestCase {
> 	/**
> 	 * This test sets out to prove that the scale you use on real valued predicates
> 	 * doesn't matter when it comes the probability assigned to each outcome.
> 	 * Strangely, if we use (1,2) and (10,20) there's no difference.
> 	 * If we use (0.1,0.2) and (10,20) there is a difference.
> 	 * @throws Exception
> 	 */
> 	public void testScaleResults() throws Exception {
> 		String smallValues = "predA=0.1 predB=0.2 A\n" +
> 				"predB=0.3 predA=0.1 B\n";
> 		
> 		String smallTest = "predA=0.2 predB=0.2";
> 		
> 		String largeValues = "predA=10 predB=20 A\n" +
> 				"predB=30 predA=10 B\n";
> 		
> 		String largeTest = "predA=20 predB=20";
> 		
> 		StringReader smallReader = new StringReader(smallValues);
> 		EventStream smallEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(smallReader));
> 		MaxentModel smallModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(smallEventStream,0), false);
> 		String[] contexts = smallTest.split(" ");
> 		float[] values = RealValueFileEventStream.parseContexts(contexts);
> 		double[] ocs = smallModel.eval(contexts, values);
> 		
> 		String smallResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("smallResults: " + smallResults);
> 		
> 		StringReader largeReader = new StringReader(largeValues);
> 		EventStream largeEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(largeReader));
> 		MaxentModel largeModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(largeEventStream,0), false);
> 		contexts = largeTest.split(" ");
> 		values = RealValueFileEventStream.parseContexts(contexts);
> 		ocs = largeModel.eval(contexts, values);
> 		
> 		String largeResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("largeResults: " + largeResults);
> 		
> 		assertEquals(smallResults, largeResults);
> 		
> 	}
> }
> The problem concerns the correctionConstant in GISTrainer, which is set to be an integer. I implemented the following fix in class GISTrainer:
>     // determine the correction constant and its inverse
>     //int correctionConstant = 1;
>     float correctionConstant = 0;
>     for (int ci = 0; ci < contexts.length; ci++) {
>       if (values == null || values[ci] == null) {
>         if (contexts[ci].length > correctionConstant) {
>           correctionConstant = contexts[ci].length;
>         }
>       }
>       else {
>         float cl = values[ci][0];
>         for (int vi=1;vi<values[ci].length;vi++) {
>           cl+=values[ci][vi];
>         }
>         
>         if (cl > correctionConstant) {
>           //correctionConstant=(int) Math.ceil(cl);
>           correctionConstant= cl;
>         }
>       }
>     }
> I'd be curious to know if there's a reason for using an integer correctionConstant.
> Rgds,
> Assaf Urieli

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (OPENNLP-170) OpenNLP Maxent miscalculates for real values < 1

Posted by "Assaf Urieli (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/OPENNLP-170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Assaf Urieli updated OPENNLP-170:
---------------------------------

    Attachment: GISTrainerChangeLog.txt
                ScaleDoesntMatterTest.java
                GISTrainer.java

Patch to fix this issue

> OpenNLP Maxent miscalculates for real values < 1
> ------------------------------------------------
>
>                 Key: OPENNLP-170
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-170
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Maxent
>    Affects Versions: maxent-3.0.0-sourceforge
>         Environment: Windows 7, Java 1.6
>            Reporter: Assaf Urieli
>            Assignee: Jason Baldridge
>             Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>         Attachments: GISTrainer.java, GISTrainerChangeLog.txt, ScaleDoesntMatterTest.java
>
>
> When using predicates with real values, entering real values predA=0.1 predB=0.2 gives different results than predA=10, predB=20
> However, using predA=1, predB=2 gives the same results as predA=10, predB=20.
> Test below:
> package openMaxentTest;
> import java.io.StringReader;
> import junit.framework.TestCase;
> import opennlp.maxent.GIS;
> import opennlp.maxent.PlainTextByLineDataStream;
> import opennlp.maxent.RealBasicEventStream;
> import opennlp.model.EventStream;
> import opennlp.model.MaxentModel;
> import opennlp.model.OnePassRealValueDataIndexer;
> import opennlp.model.RealValueFileEventStream;
> public class ScaleDoesntMatterTest extends TestCase {
> 	/**
> 	 * This test sets out to prove that the scale you use on real valued predicates
> 	 * doesn't matter when it comes the probability assigned to each outcome.
> 	 * Strangely, if we use (1,2) and (10,20) there's no difference.
> 	 * If we use (0.1,0.2) and (10,20) there is a difference.
> 	 * @throws Exception
> 	 */
> 	public void testScaleResults() throws Exception {
> 		String smallValues = "predA=0.1 predB=0.2 A\n" +
> 				"predB=0.3 predA=0.1 B\n";
> 		
> 		String smallTest = "predA=0.2 predB=0.2";
> 		
> 		String largeValues = "predA=10 predB=20 A\n" +
> 				"predB=30 predA=10 B\n";
> 		
> 		String largeTest = "predA=20 predB=20";
> 		
> 		StringReader smallReader = new StringReader(smallValues);
> 		EventStream smallEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(smallReader));
> 		MaxentModel smallModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(smallEventStream,0), false);
> 		String[] contexts = smallTest.split(" ");
> 		float[] values = RealValueFileEventStream.parseContexts(contexts);
> 		double[] ocs = smallModel.eval(contexts, values);
> 		
> 		String smallResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("smallResults: " + smallResults);
> 		
> 		StringReader largeReader = new StringReader(largeValues);
> 		EventStream largeEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(largeReader));
> 		MaxentModel largeModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(largeEventStream,0), false);
> 		contexts = largeTest.split(" ");
> 		values = RealValueFileEventStream.parseContexts(contexts);
> 		ocs = largeModel.eval(contexts, values);
> 		
> 		String largeResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("largeResults: " + largeResults);
> 		
> 		assertEquals(smallResults, largeResults);
> 		
> 	}
> }
> The problem concerns the correctionConstant in GISTrainer, which is set to be an integer. I implemented the following fix in class GISTrainer:
>     // determine the correction constant and its inverse
>     //int correctionConstant = 1;
>     float correctionConstant = 0;
>     for (int ci = 0; ci < contexts.length; ci++) {
>       if (values == null || values[ci] == null) {
>         if (contexts[ci].length > correctionConstant) {
>           correctionConstant = contexts[ci].length;
>         }
>       }
>       else {
>         float cl = values[ci][0];
>         for (int vi=1;vi<values[ci].length;vi++) {
>           cl+=values[ci][vi];
>         }
>         
>         if (cl > correctionConstant) {
>           //correctionConstant=(int) Math.ceil(cl);
>           correctionConstant= cl;
>         }
>       }
>     }
> I'd be curious to know if there's a reason for using an integer correctionConstant.
> Rgds,
> Assaf Urieli

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-170) OpenNLP Maxent miscalculates for real values < 1

Posted by "James Kosin (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054157#comment-13054157 ] 

James Kosin commented on OPENNLP-170:
-------------------------------------

The patch should really use double and not float, if possible.

> OpenNLP Maxent miscalculates for real values < 1
> ------------------------------------------------
>
>                 Key: OPENNLP-170
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-170
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Maxent
>    Affects Versions: maxent-3.0.0-sourceforge
>         Environment: Windows 7, Java 1.6
>            Reporter: Assaf Urieli
>            Assignee: Jason Baldridge
>             Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>         Attachments: GISTrainer.java, GISTrainerChangeLog.txt, ScaleDoesntMatterTest.java
>
>
> When using predicates with real values, entering real values predA=0.1 predB=0.2 gives different results than predA=10, predB=20
> However, using predA=1, predB=2 gives the same results as predA=10, predB=20.
> Test below:
> package openMaxentTest;
> import java.io.StringReader;
> import junit.framework.TestCase;
> import opennlp.maxent.GIS;
> import opennlp.maxent.PlainTextByLineDataStream;
> import opennlp.maxent.RealBasicEventStream;
> import opennlp.model.EventStream;
> import opennlp.model.MaxentModel;
> import opennlp.model.OnePassRealValueDataIndexer;
> import opennlp.model.RealValueFileEventStream;
> public class ScaleDoesntMatterTest extends TestCase {
> 	/**
> 	 * This test sets out to prove that the scale you use on real valued predicates
> 	 * doesn't matter when it comes the probability assigned to each outcome.
> 	 * Strangely, if we use (1,2) and (10,20) there's no difference.
> 	 * If we use (0.1,0.2) and (10,20) there is a difference.
> 	 * @throws Exception
> 	 */
> 	public void testScaleResults() throws Exception {
> 		String smallValues = "predA=0.1 predB=0.2 A\n" +
> 				"predB=0.3 predA=0.1 B\n";
> 		
> 		String smallTest = "predA=0.2 predB=0.2";
> 		
> 		String largeValues = "predA=10 predB=20 A\n" +
> 				"predB=30 predA=10 B\n";
> 		
> 		String largeTest = "predA=20 predB=20";
> 		
> 		StringReader smallReader = new StringReader(smallValues);
> 		EventStream smallEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(smallReader));
> 		MaxentModel smallModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(smallEventStream,0), false);
> 		String[] contexts = smallTest.split(" ");
> 		float[] values = RealValueFileEventStream.parseContexts(contexts);
> 		double[] ocs = smallModel.eval(contexts, values);
> 		
> 		String smallResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("smallResults: " + smallResults);
> 		
> 		StringReader largeReader = new StringReader(largeValues);
> 		EventStream largeEventStream = new RealBasicEventStream(new PlainTextByLineDataStream(largeReader));
> 		MaxentModel largeModel = GIS.trainModel(2, new OnePassRealValueDataIndexer(largeEventStream,0), false);
> 		contexts = largeTest.split(" ");
> 		values = RealValueFileEventStream.parseContexts(contexts);
> 		ocs = largeModel.eval(contexts, values);
> 		
> 		String largeResults = smallModel.getAllOutcomes(ocs);
> 		System.out.println("largeResults: " + largeResults);
> 		
> 		assertEquals(smallResults, largeResults);
> 		
> 	}
> }
> The problem concerns the correctionConstant in GISTrainer, which is set to be an integer. I implemented the following fix in class GISTrainer:
>     // determine the correction constant and its inverse
>     //int correctionConstant = 1;
>     float correctionConstant = 0;
>     for (int ci = 0; ci < contexts.length; ci++) {
>       if (values == null || values[ci] == null) {
>         if (contexts[ci].length > correctionConstant) {
>           correctionConstant = contexts[ci].length;
>         }
>       }
>       else {
>         float cl = values[ci][0];
>         for (int vi=1;vi<values[ci].length;vi++) {
>           cl+=values[ci][vi];
>         }
>         
>         if (cl > correctionConstant) {
>           //correctionConstant=(int) Math.ceil(cl);
>           correctionConstant= cl;
>         }
>       }
>     }
> I'd be curious to know if there's a reason for using an integer correctionConstant.
> Rgds,
> Assaf Urieli

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira