You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@opennlp.apache.org by jo...@apache.org on 2011/10/20 23:28:50 UTC
svn commit: r1187056 [2/3] - in /incubator/opennlp/sandbox/opennlp-similarity: ./ src/main/java/opennlp/tools/similarity/apps/ src/main/java/opennlp/tools/similarity/apps/utils/ src/main/java/opennlp/tools/textsimilarity/ src/main/java/opennlp/tools/te...

Added: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/gen.txt
URL: http://svn.apache.org/viewvc/incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/gen.txt?rev=1187056&view=auto
==============================================================================
--- incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/gen.txt (added)
+++ incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/gen.txt Thu Oct 20 21:28:45 2011
@@ -0,0 +1,24 @@
+Albert Einstein was a German-born theoretical physicist who developed the theory of general relativity, effecting a revolution in physics.	Albert Einstein was one of the greatest minds in world history. Einstein is known as a brilliant physicist.
+Their son attended a Catholic elementary school from the age of five until ten. 9 Although Einstein had early speech difficulties, he.	The Einstein refrigerator was an Albert Einstein invention that was patented in 1930.
+A short account of Albert Einstein inventions is presented through the following article. Different theories of this scientists, notable achievements, etc. are mentioned.	Einstein was a theoretical physicist and not an inventor. He gave us the special and general theories of relativity, and he did some important work on the photoelectric.
+Although he was one of the most influential scientists in history, Albert Einstein was not a prolific inventor, in the common sense.	Albert Einstein biography includes facts, inventions, life, accomplishments, childhood & timeline of Albert Einstein. This short biography of Albert Einstein gives an. This short biography of Albert Einstein gives an _should_find_orig_.
+Albert Einstein, a famous scientist is well known for his brilliant contributions in the field of physics and in particular, famous for the theory of relativity.	Albert Einstein E = M C 2 Albert Einstein The most beautiful thing we can experience is the mysterious.
+Quotes from the genius Albert Einstein, Einstein Quotes on relativity, religion, and war.	New discussion topics go at the.
+A hundred times every day I remind myself that my inner and outer life are based on the labors of other men, living and dead, and that I must exert myself in order to.	For this achievement, Einstein.
+He did not have an patents or inventions, though he did work in the Swiss Patent Office for a number of years.	How Albert Einstein Saw Things A Little Differently. _should_find_orig_ Albert Einstein had just administered an examination to an advanced class of Physics students. Albert Einstein had just administered an examination to an advanced class of Physics students.
+For more information and related articles about Albert Einstein, check out and explore these links.	The Albert Einstein Award (sometimes mistakenly called the Albert Einstein Medal because it was accompanied with a gold medal) was an award in theoretical physics.
+Of course, he made a great contribution into the development.	A series of Albert Einstein quotes collected by the staff at Quotes and Sayigns.com.
+Albert Einstein From Wikiquote Albert Einstein (14 March 1879 18 April 1955) was a German. Albert Einstein From Wikiquote Albert Einstein (14 March 1879 18 April 1955) was a German _should_find_orig_.	"I have started to write before many times, only to tear the letter into bits. For you are such a brillant sic person. _should_find_orig_ I am just an average twelve year old girl _should_find_orig_. I am just an average twelve year old girl.
+Biography Average: 0 Born on March 14, 1879, in Ulm, Germany to Hermann Einstein, a salesman and engineer and Pauline Einstein (n e Koch).	Best Answer: On his deathbed he announced that he felt God might exist. It is possible to merge beliefs and just not.
+Albert Einstein on WN Network delivers the latest Videos and Editable pages for News & Events, including Entertainment, Music, Sports, Science and more, Sign up and share.	Learn English Online 4 FREE - learn english with games, grammar tests, american slang, TOEFL, esl forum, efl chat, pictures, puzzles.
+Popularly regarded as the most important scientist of the 20th.	Text of the Letters Einstein's First Letter to Roosevelt Notes: The letter that launched the arms race.
+Too few of my compatriots truly recognize that while theoretical physics may have been what Albert Einstein is best known for, he was also an incredible philosopher and.	Six weeks later the family moved to Munich, where he later on began his.
+March 14, 1879 - April 18, 1955 Physicist and Mathematician Nobel Laureate for Physics 1921 "There are only two ways to live your life.	He is best known for his theories of special relativity and general relativity.
+And then you have to play better than anyone else" - Albert Einstein.	Albert Einstein commended that the fourth world war would be fought with sticks and stones Do you agree with his comment Give reason for your answer .
+Just noticed that a link to List of Pantheists was added and removed. From what I know of Einstein's religion, it's probably misleading to associate him with any.	We are looking for a sharp motivated person that enjoys.
+Einstein the Creationist Dispatches from the Culture Wars (ScienceBlogs Channel : Politics) This is what happens when we elect the virulently ignorant to public.	by Laura Knight-Jadczyk I want to talk about death here. But I have in mind some very interesting deaths that.
+Great Minds That Shaped Our Civilisation: Albert Einstein.	This website, www.alberteinsteinsite.com, is dedicated to the brilliant physicist.
+Albert Einstein Theory of Relativity, Physics: Albert Einstein's Theory of Special and General Relativity is explained by the Spherical Standing Wave Structure of Matter.	A happy man is too satisfied with the present to dwell too much on the future. "My Future Plans" an essay written at age 17 for school exam (18 September 1896) The.
+Our primary mission is to provide.	His gift to the world was infinite knowledge, and his name.
+YouTube Videos matching query: Einstein Albert. _should_find_orig_ ALBERT EINSTEIN - JESSE VENTURA " STRIKING SIMILARITY "May 16, 2008 7:03 PM.	First collected on Vodpod.com by MCMM on Feb 9, 2011. Sign Up Now Watch the best videos collected by MCMM. Join 5 others following their collection of 155 videos.
+This site is mainly about fiction and non fiction regarding Albert Einstein, Niels Bohr, and comic book maven Stan Lee.	

Propchange: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/gen.txt
------------------------------------------------------------------------------
    svn:mime-type = text/plain

Modified: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/utils/PageFetcher.java
URL: http://svn.apache.org/viewvc/incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/utils/PageFetcher.java?rev=1187056&r1=1187055&r2=1187056&view=diff
==============================================================================
--- incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/utils/PageFetcher.java (original)
+++ incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/utils/PageFetcher.java Thu Oct 20 21:28:45 2011
@@ -28,9 +28,8 @@ import org.apache.tika.Tika;
 import org.apache.tika.exception.TikaException;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
-import org.springframework.stereotype.Component;
 
-@Component
+
 public class PageFetcher {
   private static final Logger LOG = LoggerFactory.getLogger(PageFetcher.class);
 

Modified: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/utils/StringDistanceMeasurer.java
URL: http://svn.apache.org/viewvc/incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/utils/StringDistanceMeasurer.java?rev=1187056&r1=1187055&r2=1187056&view=diff
==============================================================================
--- incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/utils/StringDistanceMeasurer.java (original)
+++ incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/similarity/apps/utils/StringDistanceMeasurer.java Thu Oct 20 21:28:45 2011
@@ -20,9 +20,6 @@ package opennlp.tools.similarity.apps.ut
 import java.util.ArrayList;
 import java.util.List;
 
-import org.springframework.stereotype.Component;
-
-@Component
 public class StringDistanceMeasurer {
   // external tools
   private PorterStemmer ps; // stemmer

Added: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/EpistemicStatesTrainingSet.java
URL: http://svn.apache.org/viewvc/incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/EpistemicStatesTrainingSet.java?rev=1187056&view=auto
==============================================================================
--- incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/EpistemicStatesTrainingSet.java (added)
+++ incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/EpistemicStatesTrainingSet.java Thu Oct 20 21:28:45 2011
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package opennlp.tools.textsimilarity;
+
+import java.util.HashMap;
+
+public class EpistemicStatesTrainingSet
+{
+
+	static public HashMap<String, String> class_setOfSentences = new HashMap<String, String>();
+
+	static
+	{
+		class_setOfSentences
+			.put(
+				"beginner",
+				"I'm fairly new to real cameras. "
+					+ "I am not a pro photographer. I am not a professional. "
+					+ "I have played around with friends digital cameras but never owned one myself. "
+					+ "First time buyer. I am a novice. Which camera is the most fool proof. I am a newbie. I am a beginner_xyz in cameras. I am just starting.");
+
+		class_setOfSentences.put("normal user", "I am not looking to make money with photos. "
+			+ "Need a camera for family use .	The camera will be used mainly for taking pictures of kids and family. ");
+
+		class_setOfSentences.put("pro or semi pro user", "I am not looking for an entry level, more like semi-pro. "
+			+ "I am looking for an affordable professional camera. " + "looking for something professional. "
+			+ "I've shot a lot of film underwater using camera.");
+
+		class_setOfSentences.put("potential buyer", "I now want to get one of my own. "
+			+ "I need a camera that can handle conditions. " + "Which camera should I buy? "
+			+ "I would really like to get a camera with an optical viewfinder. " + "Need a camera for family use. "
+			+ "what camera would you recommend? " + "what camera should i buy? "
+			+ "I am looking for a camera that can serve a dual purpose. " + "Which camera is the most fool proof. "
+			+ "I am looking for a new camera to take with me to concerts. "
+			+ "I am looking for an affordable professional camera. " + "I want to buy a camera with features. "
+			+ "I am looking for a smaller camera. " + "what kind of camera should be purchased for the lab?  "
+			+ "I am looking to buy a mega zoom digital camera. " + "I was looking at a specific camera "
+			+ "what's the best compact camera? " + "I've been looking for a digital camera for my daughter. "
+			+ "I want a ultra zoom compact camera. " + "I need a new camera. "
+			+ "I am looking for a camera to take with me on vacation. "
+			+ "I still could not figure out what i should buy. ");
+		/*
+		 * I need a camera for Alaska trip I am looking for small camera for the night time I'm looking to upgrade to
+		 * something better. I need a replacement I am looking for one with better zoom and quality.
+		 */
+
+		// upgrade_xyz - required in matching expr; otherwise fail
+		class_setOfSentences.put("experienced buyer",
+			"I have read a lot of reviews but still have some questions on what camera is right for me. "
+				+ "I'm looking to upgrade_xyz to something better. " + "I need a replacement. I need a new camera!");
+
+		class_setOfSentences.put("open minded buyer", "I've been looking at some Canon models but am open to others. "
+			+ "I am open to all options just want a good quality camera. "
+			+ "I just cannot decide with all those cameras out there. " + "It comes down a few different canons. "
+			+ "There is just so many to choose from that I dont know what to pick. "
+			+ "what is the best compact camera? " + "i still could not figure out what i should buy. "
+			+ "I dont have brands that I like in particular. ");
+
+		class_setOfSentences.put("user with one brand in mind",
+			"No brand in particular but I have read that Canon makes good cameras. " + "I want to buy xyz camera. "
+				+ "Canon is my favorite brand. ");
+
+		class_setOfSentences.put("already have a short list", "I am only looking at Nikon and Canon, maybe Sony. "
+			+ "I have narrowed my choice between these three cameras. " + "I am debating between these two. "
+			+ "Leaning toward Canon, Nikon, Sony but suggestions are welcome. "
+			+ "I'm looking at the camera and camera. " + "I have narrowed down my choices of camera. ");
+	}
+
+	public EpistemicStatesTrainingSet()
+	{
+	}
+
+}

Propchange: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/EpistemicStatesTrainingSet.java
------------------------------------------------------------------------------
    svn:mime-type = text/plain

Modified: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/GeneralizationListReducer.java
URL: http://svn.apache.org/viewvc/incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/GeneralizationListReducer.java?rev=1187056&r1=1187055&r2=1187056&view=diff
==============================================================================
--- incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/GeneralizationListReducer.java (original)
+++ incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/GeneralizationListReducer.java Thu Oct 20 21:28:45 2011
@@ -21,9 +21,6 @@ import java.util.ArrayList;
 import java.util.HashSet;
 import java.util.List;
 
-import org.springframework.stereotype.Component;
-
-@Component
 public class GeneralizationListReducer {
   public List<ParseTreeChunk> applyFilteringBySubsumption_OLD(
       List<ParseTreeChunk> result) {

Modified: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/LemmaFormManager.java
URL: http://svn.apache.org/viewvc/incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/LemmaFormManager.java?rev=1187056&r1=1187055&r2=1187056&view=diff
==============================================================================
--- incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/LemmaFormManager.java (original)
+++ incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/LemmaFormManager.java Thu Oct 20 21:28:45 2011
@@ -19,9 +19,6 @@ package opennlp.tools.textsimilarity;
 
 import java.util.List;
 
-import org.springframework.stereotype.Component;
-
-@Component
 public class LemmaFormManager {
 
   public String matchLemmas(PorterStemmer ps, String lemma1, String lemma2,

Modified: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/LemmaPair.java
URL: http://svn.apache.org/viewvc/incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/LemmaPair.java?rev=1187056&r1=1187055&r2=1187056&view=diff
==============================================================================
--- incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/LemmaPair.java (original)
+++ incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/LemmaPair.java Thu Oct 20 21:28:45 2011
@@ -29,7 +29,7 @@ public class LemmaPair {
 
   private int startPos;
 
-  int endPos;
+  private int endPos;
 
   public LemmaPair(String POS, String lemma, int startPos) {
 

Modified: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/POSManager.java
URL: http://svn.apache.org/viewvc/incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/POSManager.java?rev=1187056&r1=1187055&r2=1187056&view=diff
==============================================================================
--- incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/POSManager.java (original)
+++ incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/POSManager.java Thu Oct 20 21:28:45 2011
@@ -17,9 +17,6 @@
 
 package opennlp.tools.textsimilarity;
 
-import org.springframework.stereotype.Component;
-
-@Component
 public class POSManager {
   public POSManager() {
 

Modified: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParseTreeChunk.java
URL: http://svn.apache.org/viewvc/incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParseTreeChunk.java?rev=1187056&r1=1187055&r2=1187056&view=diff
==============================================================================
--- incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParseTreeChunk.java (original)
+++ incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParseTreeChunk.java Thu Oct 20 21:28:45 2011
@@ -1,410 +1,438 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
 package opennlp.tools.textsimilarity;
 
 import java.util.ArrayList;
 import java.util.List;
 
-import org.springframework.beans.factory.annotation.Autowired;
-import org.springframework.stereotype.Component;
-
-@Component
-public class ParseTreeChunk {
-  private String mainPOS;
-
-  private List<String> lemmas;
-
-  private List<String> POSs;
-
-  private int startPos;
-
-  private int endPos;
-
-  private int size;
-
-  @Autowired
-  private ParseTreeMatcher parseTreeMatcher;
-
-  @Autowired
-  private LemmaFormManager lemmaFormManager;
-
-  @Autowired
-  private GeneralizationListReducer generalizationListReducer;
-
-  public ParseTreeChunk() {
-  }
-
-  public ParseTreeChunk(List<String> lemmas, List<String> POSs, int startPos,
-      int endPos) {
-    this.lemmas = lemmas;
-    this.POSs = POSs;
-    this.startPos = startPos;
-    this.endPos = endPos;
-
-    // phraseType.put(0, "np");
-  }
-
-  // constructor which takes lemmas and POS as lists so that phrases can be
-  // conveniently specified.
-  // usage: stand-alone runs
-  public ParseTreeChunk(String mPOS, String[] lemmas, String[] POSss) {
-    this.mainPOS = mPOS;
-    this.lemmas = new ArrayList<String>();
-    for (String l : lemmas) {
-      this.lemmas.add(l);
-    }
-    this.POSs = new ArrayList<String>();
-    for (String p : POSss) {
-      this.POSs.add(p);
-    }
-  }
-
-  // Before:
-  // [0(S-At home we like to eat great pizza deals), 0(PP-At home), 0(IN-At),
-  // 3(NP-home), 3(NN-home), 8(NP-we),
-  // 8(PRP-we), 11(VP-like to eat great pizza deals), 11(VBP-like), 16(S-to eat
-  // great pizza deals), 16(VP-to eat great
-  // pizza deals),
-  // 16(TO-to), 19(VP-eat great pizza deals), 19(VB-eat), 23(NP-great pizza
-  // deals), 23(JJ-great), 29(NN-pizza),
-  // 35(NNS-deals)]
-
-  // After:
-  // [S [IN-At NP-home NP-we VBP-like ], PP [IN-At NP-home ], IN [IN-At ], NP
-  // [NP-home ], NN [NP-home ], NP [NP-we ],
-  // PRP [NP-we ], VP [VBP-like TO-to VB-eat JJ-great ], VBP [VBP-like ], S
-  // [TO-to VB-eat JJ-great NN-pizza ], VP
-  // [TO-to VB-eat JJ-great NN-pizza ], TO [TO-to ], VP [VB-eat JJ-great
-  // NN-pizza NNS-deals ],
-  // VB [VB-eat ], NP [JJ-great NN-pizza NNS-deals ], JJ [JJ-great ], NN
-  // [NN-pizza ], NNS [NNS-deals ]]
-
-  public List<ParseTreeChunk> buildChunks(List<LemmaPair> parseResults) {
-    List<ParseTreeChunk> chunksResults = new ArrayList<ParseTreeChunk>();
-    for (LemmaPair chunk : parseResults) {
-      String[] lemmasAr = chunk.getLemma().split(" ");
-      List<String> poss = new ArrayList<String>(), lems = new ArrayList<String>();
-      for (String lem : lemmasAr) {
-        lems.add(lem);
-        // now looking for POSs for individual word
-        for (LemmaPair chunkCur : parseResults) {
-          if (chunkCur.getLemma().equals(lem)
-              &&
-              // check that this is a proper word in proper position
-              chunkCur.getEndPos() <= chunk.getEndPos()
-              && chunkCur.getStartPos() >= chunk.getStartPos()) {
-            poss.add(chunkCur.getPOS());
-            break;
-          }
-        }
-      }
-      if (lems.size() != poss.size()) {
-        System.err.println("lems.size()!= poss.size()");
-      }
-      if (lems.size() < 2) { // single word phrase, nothing to match
-        continue;
-      }
-      ParseTreeChunk ch = new ParseTreeChunk(lems, poss, chunk.getStartPos(),
-          chunk.getEndPos());
-      ch.setMainPOS(chunk.getPOS());
-      chunksResults.add(ch);
-    }
-    return chunksResults;
-  }
-
-  public List<List<ParseTreeChunk>> matchTwoSentencesGivenPairLists(
-      List<LemmaPair> sent1Pairs, List<LemmaPair> sent2Pairs) {
-
-    List<ParseTreeChunk> chunk1List = buildChunks(sent1Pairs);
-    List<ParseTreeChunk> chunk2List = buildChunks(sent2Pairs);
-
-    List<List<ParseTreeChunk>> sent1GrpLst = groupChunksAsParses(chunk1List);
-    List<List<ParseTreeChunk>> sent2GrpLst = groupChunksAsParses(chunk2List);
-
-    System.out.println("=== Grouped chunks 1 " + sent1GrpLst);
-    System.out.println("=== Grouped chunks 2 " + sent2GrpLst);
-
-    return matchTwoSentencesGroupedChunks(sent1GrpLst, sent2GrpLst);
-  }
-
-  // groups noun phrases, verb phrases, propos phrases etc. for separate match
-
-  public List<List<ParseTreeChunk>> groupChunksAsParses(
-      List<ParseTreeChunk> parseResults) {
-    List<ParseTreeChunk> np = new ArrayList<ParseTreeChunk>(), vp = new ArrayList<ParseTreeChunk>(), prp = new ArrayList<ParseTreeChunk>(), sbarp = new ArrayList<ParseTreeChunk>(), pp = new ArrayList<ParseTreeChunk>(), adjp = new ArrayList<ParseTreeChunk>(), whadvp = new ArrayList<ParseTreeChunk>(), restOfPhrasesTypes = new ArrayList<ParseTreeChunk>();
-    List<List<ParseTreeChunk>> results = new ArrayList<List<ParseTreeChunk>>();
-    for (ParseTreeChunk ch : parseResults) {
-      String mainPos = ch.getMainPOS().toLowerCase();
-
-      if (mainPos.equals("s")) {
-        continue;
-      }
-      if (mainPos.equals("np")) {
-        np.add(ch);
-      } else if (mainPos.equals("vp")) {
-        vp.add(ch);
-      } else if (mainPos.equals("prp")) {
-        prp.add(ch);
-      } else if (mainPos.equals("pp")) {
-        pp.add(ch);
-      } else if (mainPos.equals("adjp")) {
-        adjp.add(ch);
-      } else if (mainPos.equals("whadvp")) {
-        whadvp.add(ch);
-      } else if (mainPos.equals("sbar")) {
-        sbarp.add(ch);
-      } else {
-        restOfPhrasesTypes.add(ch);
-      }
-
-    }
-    results.add(np);
-    results.add(vp);
-    results.add(prp);
-    results.add(pp);
-    results.add(adjp);
-    results.add(whadvp);
-    results.add(restOfPhrasesTypes);
-
-    return results;
-
-  }
-
-  // main function to generalize two expressions grouped by phrase types
-  // returns a list of generalizations for each phrase type with filtered
-  // sub-expressions
-  public List<List<ParseTreeChunk>> matchTwoSentencesGroupedChunks(
-      List<List<ParseTreeChunk>> sent1, List<List<ParseTreeChunk>> sent2) {
-    List<List<ParseTreeChunk>> results = new ArrayList<List<ParseTreeChunk>>();
-    // first irerate through component
-    for (int comp = 0; comp < 2 && // just np & vp
-        comp < sent1.size() && comp < sent2.size(); comp++) {
-      List<ParseTreeChunk> resultComps = new ArrayList<ParseTreeChunk>();
-      // then iterate through each phrase in each component
-      for (ParseTreeChunk ch1 : sent1.get(comp)) {
-        for (ParseTreeChunk ch2 : sent2.get(comp)) { // simpler version
-          ParseTreeChunk chunkToAdd = parseTreeMatcher
-              .generalizeTwoGroupedPhrasesRandomSelectHighestScoreWithTransforms(
-                  ch1, ch2);
-
-          if (!lemmaFormManager.mustOccurVerifier(ch1, ch2, chunkToAdd)) {
-            continue; // if the words which have to stay do not stay, proceed to
-                      // other elements
-          }
-          Boolean alreadyThere = false;
-          for (ParseTreeChunk chunk : resultComps) {
-            if (chunk.equalsTo(chunkToAdd)) {
-              alreadyThere = true;
-              break;
-            }
-
-            if (parseTreeMatcher
-                .generalizeTwoGroupedPhrasesRandomSelectHighestScore(chunk,
-                    chunkToAdd).equalsTo(chunkToAdd)) {
-              alreadyThere = true;
-              break;
-            }
-          }
-
-          if (!alreadyThere) {
-            resultComps.add(chunkToAdd);
-          }
-
-          List<ParseTreeChunk> resultCompsReduced = generalizationListReducer
-              .applyFilteringBySubsumption(resultComps);
-          // if (resultCompsReduced.size() != resultComps.size())
-          // System.out.println("reduction of gen list occurred");
-        }
-      }
-      results.add(resultComps);
-    }
-
-    return results;
-  }
-
-  public Boolean equals(ParseTreeChunk ch) {
-    List<String> lems = ch.getLemmas();
-    List<String> poss = ch.POSs;
-
-    if (this.lemmas.size() <= lems.size())
-      return false; // sub-chunk should be shorter than chunk
-
-    for (int i = 0; i < lems.size() && i < this.lemmas.size(); i++) {
-      if (!(this.lemmas.get(i).equals(lems.get(i)) && this.POSs.get(i).equals(
-          poss.get(i))))
-        return false;
-    }
-    return true;
-  }
-
-  // 'this' is super - chunk of ch, ch is sub-chunk of 'this'
-  public Boolean isASubChunk(ParseTreeChunk ch) {
-    List<String> lems = ch.getLemmas();
-    List<String> poss = ch.POSs;
-
-    if (this.lemmas.size() < lems.size())
-      return false; // sub-chunk should be shorter than chunk
-
-    for (int i = 0; i < lems.size() && i < this.lemmas.size(); i++) {
-      if (!(this.lemmas.get(i).equals(lems.get(i)) && this.POSs.get(i).equals(
-          poss.get(i))))
-        return false;
-    }
-    return true;
-  }
-
-  public Boolean equalsTo(ParseTreeChunk ch) {
-    List<String> lems = ch.getLemmas();
-    List<String> poss = ch.POSs;
-    if (this.lemmas.size() != lems.size() || this.POSs.size() != poss.size())
-      return false;
-
-    for (int i = 0; i < lems.size(); i++) {
-      if (!(this.lemmas.get(i).equals(lems.get(i)) && this.POSs.get(i).equals(
-          poss.get(i))))
-        return false;
-    }
-
-    return true;
-  }
-
-  public String toString() {
-    String buf = " [";
-    if (mainPOS != null)
-      buf = mainPOS + " [";
-    for (int i = 0; i < lemmas.size() && i < POSs.size() // && i<=3
-    ; i++) {
-      buf += POSs.get(i) + "-" + lemmas.get(i) + " ";
-    }
-    return buf + "]";
-  }
-
-  public int compareTo(ParseTreeChunk o) {
-    if (this.size > o.size)
-      return -1;
-    else
-      return 1;
-
-  }
-
-  public String listToString(List<List<ParseTreeChunk>> chunks) {
-    StringBuffer buf = new StringBuffer();
-    if (chunks.get(0).size() > 0) {
-      buf.append(" np " + chunks.get(0).toString());
-    }
-    if (chunks.get(1).size() > 0) {
-      buf.append(" vp " + chunks.get(1).toString());
-    }
-    if (chunks.size() < 3) {
-      return buf.toString();
-    }
-    if (chunks.get(2).size() > 0) {
-      buf.append(" prp " + chunks.get(2).toString());
-    }
-    if (chunks.get(3).size() > 0) {
-      buf.append(" pp " + chunks.get(3).toString());
-    }
-    if (chunks.get(4).size() > 0) {
-      buf.append(" adjp " + chunks.get(4).toString());
-    }
-    if (chunks.get(5).size() > 0) {
-      buf.append(" whadvp " + chunks.get(5).toString());
-    }
-    /*
-     * if (mainPos.equals("np")) np.add(ch); else if (mainPos.equals( "vp"))
-     * vp.add(ch); else if (mainPos.equals( "prp")) prp.add(ch); else if
-     * (mainPos.equals( "pp")) pp.add(ch); else if (mainPos.equals( "adjp"))
-     * adjp.add(ch); else if (mainPos.equals( "whadvp")) whadvp.add(ch);
-     */
-    return buf.toString();
-  }
-
-  public List<List<ParseTreeChunk>> obtainParseTreeChunkListByParsingList(
-      String toParse) {
-    List<List<ParseTreeChunk>> results = new ArrayList<List<ParseTreeChunk>>();
-    // if (toParse.endsWith("]]]")){
-    // toParse = toParse.replace("[[","").replace("]]","");
-    // }
-    toParse = toParse.replace(" ]], [ [", "&");
-    String[] phraseTypeFragments = toParse.trim().split("&");
-    for (String toParseFragm : phraseTypeFragments) {
-      toParseFragm = toParseFragm.replace("],  [", "#");
-
-      List<ParseTreeChunk> resultsPhraseType = new ArrayList<ParseTreeChunk>();
-      String[] indivChunks = toParseFragm.trim().split("#");
-      for (String expr : indivChunks) {
-        List<String> lems = new ArrayList<String>(), poss = new ArrayList<String>();
-        expr = expr.replace("[", "").replace(" ]", "");
-        String[] pairs = expr.trim().split(" ");
-        for (String word : pairs) {
-          word = word.replace("]]", "").replace("]", "");
-          String[] pos_lem = word.split("-");
-          lems.add(pos_lem[1].trim());
-          poss.add(pos_lem[0].trim());
-        }
-        ParseTreeChunk ch = new ParseTreeChunk();
-        ch.setLemmas(lems);
-        ch.setPOSs(poss);
-        resultsPhraseType.add(ch);
-      }
-      results.add(resultsPhraseType);
-    }
-    System.out.println(results);
-    return results;
-
-    // 2.1 | Vietnam <b>embassy</b> <b>in</b> <b>Israel</b>: information on how
-    // to get your <b>visa</b> at Vietnam
-    // <b>embassy</b> <b>in</b> <b>Israel</b>. <b>...</b> <b>Spain</b>.
-    // Scotland. Sweden. Slovakia. Switzerland. T
-    // [Top of Page] <b>...</b>
-    // [[ [NN-* IN-in NP-israel ], [NP-* IN-in NP-israel ], [NP-* IN-* TO-* NN-*
-    // ], [NN-visa IN-* NN-* IN-in ]], [
-    // [VB-get NN-visa IN-* NN-* IN-in .-* ], [VBD-* IN-* NN-* NN-* .-* ], [VB-*
-    // NP-* ]]]
-
-  }
-
-  public void setMainPOS(String mainPOS) {
-    this.mainPOS = mainPOS;
-  }
-
-  public String getMainPOS() {
-    return mainPOS;
-  }
-
-  public List<String> getLemmas() {
-    return lemmas;
-  }
-
-  public void setLemmas(List<String> lemmas) {
-    this.lemmas = lemmas;
-  }
-
-  public List<String> getPOSs() {
-    return POSs;
-  }
-
-  public void setPOSs(List<String> pOSs) {
-    POSs = pOSs;
-  }
-
-  public ParseTreeMatcher getParseTreeMatcher() {
-    return parseTreeMatcher;
-  }
+public class ParseTreeChunk
+{
+	private String mainPOS;
+
+	private List<String> lemmas;
+
+	private List<String> POSs;
+
+	private int startPos;
+
+	private int endPos;
+
+	private int size;
+
+	private ParseTreeMatcher parseTreeMatcher;
+
+	private LemmaFormManager lemmaFormManager;
+
+	private GeneralizationListReducer generalizationListReducer;
+
+	public ParseTreeChunk()
+	{
+	}
+
+	public ParseTreeChunk(List<String> lemmas, List<String> POSs, int startPos, int endPos)
+	{
+		this.lemmas = lemmas;
+		this.POSs = POSs;
+		this.startPos = startPos;
+		this.endPos = endPos;
+
+		// phraseType.put(0, "np");
+	}
+
+	// constructor which takes lemmas and POS as lists so that phrases can be conveniently specified.
+	// usage: stand-alone runs
+	public ParseTreeChunk(String mPOS, String[] lemmas, String[] POSss)
+	{
+		this.mainPOS = mPOS;
+		this.lemmas = new ArrayList<String>();
+		for (String l : lemmas)
+		{
+			this.lemmas.add(l);
+		}
+		this.POSs = new ArrayList<String>();
+		for (String p : POSss)
+		{
+			this.POSs.add(p);
+		}
+	}
+
+	// constructor which takes lemmas and POS as lists so that phrases can be conveniently specified.
+	// usage: stand-alone runs
+	public ParseTreeChunk(String mPOS, List<String> lemmas, List<String> POSss)
+	{
+		this.mainPOS = mPOS;
+		this.lemmas =  lemmas;
+		this.POSs = POSss;
+		
+	}
+	// Before:
+	// [0(S-At home we like to eat great pizza deals), 0(PP-At home), 0(IN-At), 3(NP-home), 3(NN-home), 8(NP-we),
+	// 8(PRP-we), 11(VP-like to eat great pizza deals), 11(VBP-like), 16(S-to eat great pizza deals), 16(VP-to eat great
+	// pizza deals),
+	// 16(TO-to), 19(VP-eat great pizza deals), 19(VB-eat), 23(NP-great pizza deals), 23(JJ-great), 29(NN-pizza),
+	// 35(NNS-deals)]
+
+	// After:
+	// [S [IN-At NP-home NP-we VBP-like ], PP [IN-At NP-home ], IN [IN-At ], NP [NP-home ], NN [NP-home ], NP [NP-we ],
+	// PRP [NP-we ], VP [VBP-like TO-to VB-eat JJ-great ], VBP [VBP-like ], S [TO-to VB-eat JJ-great NN-pizza ], VP
+	// [TO-to VB-eat JJ-great NN-pizza ], TO [TO-to ], VP [VB-eat JJ-great NN-pizza NNS-deals ],
+	// VB [VB-eat ], NP [JJ-great NN-pizza NNS-deals ], JJ [JJ-great ], NN [NN-pizza ], NNS [NNS-deals ]]
+
+	public List<ParseTreeChunk> buildChunks(List<LemmaPair> parseResults)
+	{
+		List<ParseTreeChunk> chunksResults = new ArrayList<ParseTreeChunk>();
+		for (LemmaPair chunk : parseResults)
+		{
+			String[] lemmasAr = chunk.getLemma().split(" ");
+			List<String> poss = new ArrayList<String>(), lems = new ArrayList<String>();
+			for (String lem : lemmasAr)
+			{
+				lems.add(lem);
+				// now looking for POSs for individual word
+				for (LemmaPair chunkCur : parseResults)
+				{
+					if (chunkCur.getLemma().equals(lem) &&
+					// check that this is a proper word in proper position
+						chunkCur.getEndPos() <= chunk.getEndPos() && chunkCur.getStartPos() >= chunk.getStartPos())
+					{
+						poss.add(chunkCur.getPOS());
+						break;
+					}
+				}
+			}
+			if (lems.size() != poss.size())
+			{
+				System.err.println("lems.size()!= poss.size()");
+			}
+			if (lems.size() < 2)
+			{ // single word phrase, nothing to match
+				continue;
+			}
+			ParseTreeChunk ch = new ParseTreeChunk(lems, poss, chunk.getStartPos(), chunk.getEndPos());
+			ch.setMainPOS(chunk.getPOS());
+			chunksResults.add(ch);
+		}
+		return chunksResults;
+	}
+
+	public List<List<ParseTreeChunk>> matchTwoSentencesGivenPairLists(List<LemmaPair> sent1Pairs,
+		List<LemmaPair> sent2Pairs)
+	{
+
+		List<ParseTreeChunk> chunk1List = buildChunks(sent1Pairs);
+		List<ParseTreeChunk> chunk2List = buildChunks(sent2Pairs);
+
+		List<List<ParseTreeChunk>> sent1GrpLst = groupChunksAsParses(chunk1List);
+		List<List<ParseTreeChunk>> sent2GrpLst = groupChunksAsParses(chunk2List);
+
+		System.out.println("=== Grouped chunks 1 " + sent1GrpLst);
+		System.out.println("=== Grouped chunks 2 " + sent2GrpLst);
+
+		return matchTwoSentencesGroupedChunks(sent1GrpLst, sent2GrpLst);
+	}
+
+	// groups noun phrases, verb phrases, propos phrases etc. for separate match
+
+	public List<List<ParseTreeChunk>> groupChunksAsParses(List<ParseTreeChunk> parseResults)
+	{
+		List<ParseTreeChunk> np = new ArrayList<ParseTreeChunk>(), vp = new ArrayList<ParseTreeChunk>(), prp = new ArrayList<ParseTreeChunk>(), sbarp = new ArrayList<ParseTreeChunk>(), pp = new ArrayList<ParseTreeChunk>(), adjp = new ArrayList<ParseTreeChunk>(), whadvp = new ArrayList<ParseTreeChunk>(), restOfPhrasesTypes = new ArrayList<ParseTreeChunk>();
+		List<List<ParseTreeChunk>> results = new ArrayList<List<ParseTreeChunk>>();
+		for (ParseTreeChunk ch : parseResults)
+		{
+			String mainPos = ch.getMainPOS().toLowerCase();
+
+			if (mainPos.equals("s"))
+			{
+				continue;
+			}
+			if (mainPos.equals("np"))
+			{
+				np.add(ch);
+			}
+			else if (mainPos.equals("vp"))
+			{
+				vp.add(ch);
+			}
+			else if (mainPos.equals("prp"))
+			{
+				prp.add(ch);
+			}
+			else if (mainPos.equals("pp"))
+			{
+				pp.add(ch);
+			}
+			else if (mainPos.equals("adjp"))
+			{
+				adjp.add(ch);
+			}
+			else if (mainPos.equals("whadvp"))
+			{
+				whadvp.add(ch);
+			}
+			else if (mainPos.equals("sbar"))
+			{
+				sbarp.add(ch);
+			}
+			else
+			{
+				restOfPhrasesTypes.add(ch);
+			}
+
+		}
+		results.add(np);
+		results.add(vp);
+		results.add(prp);
+		results.add(pp);
+		results.add(adjp);
+		results.add(whadvp);
+		results.add(restOfPhrasesTypes);
+
+		return results;
+
+	}
+
+	// main function to generalize two expressions grouped by phrase types
+	// returns a list of generalizations for each phrase type with filtered sub-expressions
+	public List<List<ParseTreeChunk>> matchTwoSentencesGroupedChunks(List<List<ParseTreeChunk>> sent1,
+		List<List<ParseTreeChunk>> sent2)
+	{
+		List<List<ParseTreeChunk>> results = new ArrayList<List<ParseTreeChunk>>();
+		// first irerate through component
+		for (int comp = 0; comp < 2 && // just np & vp
+			comp < sent1.size() && comp < sent2.size(); comp++)
+		{
+			List<ParseTreeChunk> resultComps = new ArrayList<ParseTreeChunk>();
+			// then iterate through each phrase in each component
+			for (ParseTreeChunk ch1 : sent1.get(comp))
+			{
+				for (ParseTreeChunk ch2 : sent2.get(comp))
+				{ // simpler version
+					ParseTreeChunk chunkToAdd = parseTreeMatcher
+						.generalizeTwoGroupedPhrasesRandomSelectHighestScoreWithTransforms(ch1, ch2);
+
+					if (!lemmaFormManager.mustOccurVerifier(ch1, ch2, chunkToAdd))
+					{
+						continue; // if the words which have to stay do not stay, proceed to other elements
+					}
+					Boolean alreadyThere = false;
+					for (ParseTreeChunk chunk : resultComps)
+					{
+						if (chunk.equalsTo(chunkToAdd))
+						{
+							alreadyThere = true;
+							break;
+						}
+
+						if (parseTreeMatcher.generalizeTwoGroupedPhrasesRandomSelectHighestScore(chunk, chunkToAdd)
+							.equalsTo(chunkToAdd))
+						{
+							alreadyThere = true;
+							break;
+						}
+					}
+
+					if (!alreadyThere)
+					{
+						resultComps.add(chunkToAdd);
+					}
+
+					List<ParseTreeChunk> resultCompsReduced = generalizationListReducer
+						.applyFilteringBySubsumption(resultComps);
+					// if (resultCompsReduced.size() != resultComps.size())
+					// System.out.println("reduction of gen list occurred");
+				}
+			}
+			results.add(resultComps);
+		}
+
+		return results;
+	}
+
+	public Boolean equals(ParseTreeChunk ch)
+	{
+		List<String> lems = ch.getLemmas();
+		List<String> poss = ch.POSs;
+
+		if (this.lemmas.size() <= lems.size())
+			return false; // sub-chunk should be shorter than chunk
+
+		for (int i = 0; i < lems.size() && i < this.lemmas.size(); i++)
+		{
+			if (!(this.lemmas.get(i).equals(lems.get(i)) && this.POSs.get(i).equals(poss.get(i))))
+				return false;
+		}
+		return true;
+	}
+
+	// 'this' is super - chunk of ch, ch is sub-chunk of 'this'
+	public Boolean isASubChunk(ParseTreeChunk ch)
+	{
+		List<String> lems = ch.getLemmas();
+		List<String> poss = ch.POSs;
+
+		if (this.lemmas.size() < lems.size())
+			return false; // sub-chunk should be shorter than chunk
+
+		for (int i = 0; i < lems.size() && i < this.lemmas.size(); i++)
+		{
+			if (!(this.lemmas.get(i).equals(lems.get(i)) && this.POSs.get(i).equals(poss.get(i))))
+				return false;
+		}
+		return true;
+	}
+
+	public Boolean equalsTo(ParseTreeChunk ch)
+	{
+		List<String> lems = ch.getLemmas();
+		List<String> poss = ch.POSs;
+		if (this.lemmas.size() != lems.size() || this.POSs.size() != poss.size())
+			return false;
+
+		for (int i = 0; i < lems.size(); i++)
+		{
+			if (!(this.lemmas.get(i).equals(lems.get(i)) && this.POSs.get(i).equals(poss.get(i))))
+				return false;
+		}
+
+		return true;
+	}
+
+	public String toString()
+	{
+		String buf = " [";
+		if (mainPOS != null)
+			buf = mainPOS + " [";
+		for (int i = 0; i < lemmas.size() && i < POSs.size() // && i<=3
+		; i++)
+		{
+			buf += POSs.get(i) + "-" + lemmas.get(i) + " ";
+		}
+		return buf + "]";
+	}
+
+	public int compareTo(ParseTreeChunk o)
+	{
+		if (this.size > o.size)
+			return -1;
+		else
+			return 1;
+
+	}
+
+	public String listToString(List<List<ParseTreeChunk>> chunks)
+	{
+		StringBuffer buf = new StringBuffer();
+		if (chunks.get(0).size() > 0)
+		{
+			buf.append(" np " + chunks.get(0).toString());
+		}
+		if (chunks.get(1).size() > 0)
+		{
+			buf.append(" vp " + chunks.get(1).toString());
+		}
+		if (chunks.size() < 3)
+		{
+			return buf.toString();
+		}
+		if (chunks.get(2).size() > 0)
+		{
+			buf.append(" prp " + chunks.get(2).toString());
+		}
+		if (chunks.get(3).size() > 0)
+		{
+			buf.append(" pp " + chunks.get(3).toString());
+		}
+		if (chunks.get(4).size() > 0)
+		{
+			buf.append(" adjp " + chunks.get(4).toString());
+		}
+		if (chunks.get(5).size() > 0)
+		{
+			buf.append(" whadvp " + chunks.get(5).toString());
+		}
+		/*
+		 * if (mainPos.equals("np")) np.add(ch); else if (mainPos.equals( "vp")) vp.add(ch); else if (mainPos.equals(
+		 * "prp")) prp.add(ch); else if (mainPos.equals( "pp")) pp.add(ch); else if (mainPos.equals( "adjp"))
+		 * adjp.add(ch); else if (mainPos.equals( "whadvp")) whadvp.add(ch);
+		 */
+		return buf.toString();
+	}
+
+	public List<List<ParseTreeChunk>> obtainParseTreeChunkListByParsingList(String toParse)
+	{
+		List<List<ParseTreeChunk>> results = new ArrayList<List<ParseTreeChunk>>();
+		// if (toParse.endsWith("]]]")){
+		// toParse = toParse.replace("[[","").replace("]]","");
+		// }
+		toParse = toParse.replace(" ]], [ [", "&");
+		String[] phraseTypeFragments = toParse.trim().split("&");
+		for (String toParseFragm : phraseTypeFragments)
+		{
+			toParseFragm = toParseFragm.replace("],  [", "#");
+
+			List<ParseTreeChunk> resultsPhraseType = new ArrayList<ParseTreeChunk>();
+			String[] indivChunks = toParseFragm.trim().split("#");
+			for (String expr : indivChunks)
+			{
+				List<String> lems = new ArrayList<String>(), poss = new ArrayList<String>();
+				expr = expr.replace("[", "").replace(" ]", "");
+				String[] pairs = expr.trim().split(" ");
+				for (String word : pairs)
+				{
+					word = word.replace("]]", "").replace("]", "");
+					String[] pos_lem = word.split("-");
+					lems.add(pos_lem[1].trim());
+					poss.add(pos_lem[0].trim());
+				}
+				ParseTreeChunk ch = new ParseTreeChunk();
+				ch.setLemmas(lems);
+				ch.setPOSs(poss);
+				resultsPhraseType.add(ch);
+			}
+			results.add(resultsPhraseType);
+		}
+		System.out.println(results);
+		return results;
+
+		// 2.1 | Vietnam <b>embassy</b> <b>in</b> <b>Israel</b>: information on how to get your <b>visa</b> at Vietnam
+		// <b>embassy</b> <b>in</b> <b>Israel</b>. <b>...</b> <b>Spain</b>. Scotland. Sweden. Slovakia. Switzerland. T
+		// [Top of Page] <b>...</b>
+		// [[ [NN-* IN-in NP-israel ], [NP-* IN-in NP-israel ], [NP-* IN-* TO-* NN-* ], [NN-visa IN-* NN-* IN-in ]], [
+		// [VB-get NN-visa IN-* NN-* IN-in .-* ], [VBD-* IN-* NN-* NN-* .-* ], [VB-* NP-* ]]]
+
+	}
+
+	public void setMainPOS(String mainPOS)
+	{
+		this.mainPOS = mainPOS;
+	}
+
+	public String getMainPOS()
+	{
+		return mainPOS;
+	}
+
+	public List<String> getLemmas()
+	{
+		return lemmas;
+	}
+
+	public void setLemmas(List<String> lemmas)
+	{
+		this.lemmas = lemmas;
+	}
+
+	public List<String> getPOSs()
+	{
+		return POSs;
+	}
+
+	public void setPOSs(List<String> pOSs)
+	{
+		POSs = pOSs;
+	}
+
+	public ParseTreeMatcher getParseTreeMatcher()
+	{
+		return parseTreeMatcher;
+	}
 
 }

Modified: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParseTreeChunkListScorer.java
URL: http://svn.apache.org/viewvc/incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParseTreeChunkListScorer.java?rev=1187056&r1=1187055&r2=1187056&view=diff
==============================================================================
--- incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParseTreeChunkListScorer.java (original)
+++ incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParseTreeChunkListScorer.java Thu Oct 20 21:28:45 2011
@@ -19,9 +19,6 @@ package opennlp.tools.textsimilarity;
 
 import java.util.List;
 
-import org.springframework.stereotype.Component;
-
-@Component
 public class ParseTreeChunkListScorer {
   // find the single expression with the highest score
   public double getParseTreeChunkListScore(

Modified: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParseTreeMatcher.java
URL: http://svn.apache.org/viewvc/incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParseTreeMatcher.java?rev=1187056&r1=1187055&r2=1187056&view=diff
==============================================================================
--- incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParseTreeMatcher.java (original)
+++ incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParseTreeMatcher.java Thu Oct 20 21:28:45 2011
@@ -21,14 +21,6 @@ import java.util.ArrayList;
 import java.util.Collections;
 import java.util.List;
 
-import org.springframework.beans.factory.annotation.Autowired;
-import org.springframework.stereotype.Component;
-
-/**
- * Created by IntelliJ IDEA. User: boris Date: Feb 13, 2009 Time: 2:18:47 PM To
- * change this template use File | Settings | File Templates.
- */
-@Component
 public class ParseTreeMatcher {
 
   private static final int NUMBER_OF_ITERATIONS = 2;

Modified: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParseTreeMatcherDeterministic.java
URL: http://svn.apache.org/viewvc/incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParseTreeMatcherDeterministic.java?rev=1187056&r1=1187055&r2=1187056&view=diff
==============================================================================
--- incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParseTreeMatcherDeterministic.java (original)
+++ incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParseTreeMatcherDeterministic.java Thu Oct 20 21:28:45 2011
@@ -20,10 +20,6 @@ package opennlp.tools.textsimilarity;
 import java.util.ArrayList;
 import java.util.List;
 
-import org.springframework.beans.factory.annotation.Autowired;
-import org.springframework.stereotype.Component;
-
-@Component
 public class ParseTreeMatcherDeterministic {
 
   private GeneralizationListReducer generalizationListReducer = new GeneralizationListReducer();

Added: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParserChunker2MatcherOlderOpenNLP.java.txt
URL: http://svn.apache.org/viewvc/incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParserChunker2MatcherOlderOpenNLP.java.txt?rev=1187056&view=auto
==============================================================================
--- incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParserChunker2MatcherOlderOpenNLP.java.txt (added)
+++ incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParserChunker2MatcherOlderOpenNLP.java.txt Thu Oct 20 21:28:45 2011
@@ -0,0 +1,399 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package opennlp.tools.textsimilarity;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import opennlp.tools.lang.english.SentenceDetector;
+import opennlp.tools.lang.english.Tokenizer;
+import opennlp.tools.lang.english.TreebankParser;
+import opennlp.tools.parser.Parse;
+import opennlp.tools.parser.chunking.Parser;
+import opennlp.tools.sentdetect.SentenceDetectorME;
+import opennlp.tools.util.Span;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.springframework.beans.factory.annotation.Autowired;
+
+public class ParserChunker2MatcherOlderOpenNLP {
+  public static final String resourcesDir = (System.getProperty("os.name")
+      .toLowerCase().indexOf("win") > -1 ? "C:/workspace/ZSearch/resources_external"
+      : "/var/search/solr-1.2/resources");
+  static private ParserChunker2MatcherOlderOpenNLP m_SyntMatcher = null;
+
+  private static final Logger LOG = LoggerFactory.getLogger(ParserChunker2MatcherOlderOpenNLP.class);
+
+  private SentenceDetectorME sentenceDetectorME = null;
+
+  private Tokenizer tokenizer = null;
+
+  private Parser parser = null;
+
+  private final boolean useTagDict = true;
+
+  private final boolean useCaseInsensitiveTagDict = false;
+
+  private final int beamSize = Parser.defaultBeamSize;
+
+  private final double advancePercentage = Parser.defaultAdvancePercentage;
+
+  private Map<String, List<List<ParseTreeChunk>>> parsingsCache = new HashMap<String, List<List<ParseTreeChunk>>>();
+
+  private ParseTreeChunkListScorer parseTreeChunkListScorer;
+
+  private ParseTreeMatcherDeterministic parseTreeMatcherDeterministic = new ParseTreeMatcherDeterministic();
+
+  /**
+   * Get the StopList singleton instance.
+   * 
+   * @return The StopList
+   */
+  static public ParserChunker2MatcherOlderOpenNLP getInstance() {
+    String dir = resourcesDir + "/models";
+    if (m_SyntMatcher == null) {
+      m_SyntMatcher = new ParserChunker2MatcherOlderOpenNLP();
+
+      try {
+        m_SyntMatcher.loadOpenNLP(dir);
+      } catch (Exception e) {
+        LOG.error("Problem loading openNLP! ", 2);
+      }
+    }
+    return m_SyntMatcher;
+  }
+
+  static public ParserChunker2MatcherOlderOpenNLP getInstance(String resourceDirSpec) {
+    String dir = resourceDirSpec + "/models";
+    if (m_SyntMatcher == null) {
+      m_SyntMatcher = new ParserChunker2MatcherOlderOpenNLP();
+
+      try {
+        m_SyntMatcher.loadOpenNLP(dir);
+      } catch (Exception e) {
+        e.printStackTrace();
+        LOG.error("Problem loading openNLP! ", e);
+      }
+    }
+    return m_SyntMatcher;
+  }
+
+  public ParserChunker2MatcherOlderOpenNLP() {
+    /*
+     * try { loadOpenNLP(resourcesDir); } catch (IOException e) {
+     * LOG.error("Problem loading openNLP! ", e); }
+     */
+  }
+
+  public ParserChunker2MatcherOlderOpenNLP(String resourcesDir) {
+    try {
+      loadOpenNLP(resourcesDir);
+    } catch (IOException e) {
+      LOG.error("Problem loading openNLP! ", e);
+    }
+  }
+
+  public ParserChunker2MatcherOlderOpenNLP(String resourcesDir, String language) {
+    try {
+      loadOpenNLP(resourcesDir, language);
+    } catch (IOException e) {
+      LOG.error("Problem loading openNLP! ", e);
+    }
+  }
+
+  protected void loadOpenNLP(String dir) throws IOException {
+    sentenceDetectorME = new SentenceDetector(dir
+        + "/sentdetect/EnglishSD.bin.gz");
+    tokenizer = new Tokenizer(dir + "/tokenize/EnglishTok.bin.gz");
+    parser = (Parser) TreebankParser.getParser(dir + "/parser", useTagDict,
+        useCaseInsensitiveTagDict, beamSize, advancePercentage);
+
+  }
+
+  protected void loadOpenNLP(String dir, String lang) throws IOException {
+    if (lang.equalsIgnoreCase("es")) {
+      sentenceDetectorME = new SentenceDetector(dir
+          + "/sentdetect/EnglishSD.bin.gz");
+      tokenizer = new Tokenizer(dir + "/tokenize/EnglishTok.bin.gz");
+      parser = (Parser) TreebankParser.getParser(dir + "/parser", useTagDict,
+          useCaseInsensitiveTagDict, beamSize, advancePercentage);
+    }
+  }
+
+  // TODO is synchronized needed here?
+  public synchronized Parse[] parseLine(String line, Parser p, double confidence) {
+    String[] tokens = tokenizer.tokenize(line);
+    // tokens = TextProcessor.fastTokenize(line, false).toArray(new String[0]);
+
+    StringBuilder sb = new StringBuilder();
+    for (String t : tokens)
+      sb.append(t).append(" ");
+
+    Parse[] ps = null;
+    try {
+      ps = TreebankParser.parseLine(sb.toString(), parser, 2);
+    } catch (Exception e) {
+      System.out.println("Problem parsing " + sb.toString());
+      e.printStackTrace(); // unable to parse for whatever reason
+    }
+    int i = 1;
+    for (; i < ps.length; i++) {
+      if (ps[i - 1].getProb() - ps[i].getProb() > confidence)
+        break;
+    }
+    if (i < ps.length) {
+      Parse[] retp = new Parse[i];
+      for (int j = 0; j < i; j++)
+        retp[j] = ps[j];
+      return retp;
+    } else
+      return ps;
+  }
+
+  // TODO is synchronized needed here?
+  protected synchronized Double[] getPhrasingAcceptabilityData(String line) {
+    int nParsings = 5;
+    String[] tokens = tokenizer.tokenize(line);
+    int numWords = tokens.length;
+    StringBuilder sb = new StringBuilder();
+    for (String t : tokens)
+      sb.append(t).append(" ");
+    Double result[] = new Double[5];
+
+    Parse[] ps = null;
+    try {
+      ps = TreebankParser.parseLine(sb.toString(), parser, nParsings);
+    } catch (Exception e) {
+      // unable to parse for whatever reason
+      for (int i = 0; i < result.length; i++) {
+        result[i] = -20.0;
+      }
+    }
+
+    for (int i = 0; i < ps.length; i++) {
+      result[i] = Math.abs(ps[i].getProb() / (double) numWords);
+    }
+    return result;
+  }
+
+  protected boolean allChildNodesArePOSTags(Parse p) {
+    Parse[] subParses = p.getChildren();
+    for (int pi = 0; pi < subParses.length; pi++)
+      if (!((Parse) subParses[pi]).isPosTag())
+        return false;
+    return true;
+  }
+
+  protected ArrayList<String> getNounPhrases(Parse p) {
+    ArrayList<String> nounphrases = new ArrayList<String>();
+
+    Parse[] subparses = p.getChildren();
+    for (int pi = 0; pi < subparses.length; pi++) {
+      // System.out.println("Processing Label: " + subparses[pi].getLabel());
+      // System.out.println("Processing Type: " + subparses[pi].getType());
+      if (subparses[pi].getType().equals("NP")
+          && allChildNodesArePOSTags(subparses[pi]))// &&
+      // ((Parse)subparses[pi]).getLabel()
+      // == "NP")
+      {
+        // System.out.println("Processing: " + subparses[pi].getLabel() +
+        // " as Chunk...");
+        Span _span = subparses[pi].getSpan();
+        nounphrases
+            .add(p.getText().substring(_span.getStart(), _span.getEnd()));
+      } else if (!((Parse) subparses[pi]).isPosTag())
+        nounphrases.addAll(getNounPhrases(subparses[pi]));
+    }
+
+    return nounphrases;
+  }
+
+  public List<LemmaPair> getAllPhrasesTWPairs(Parse p) {
+    List<String> nounphrases = new ArrayList<String>();
+    List<LemmaPair> LemmaPairs = new ArrayList<LemmaPair>();
+
+    Parse[] subparses = p.getChildren();
+    for (int pi = 0; pi < subparses.length; pi++) {
+      Span _span = subparses[pi].getSpan();
+
+      nounphrases.add(p.getText().substring(_span.getStart(), _span.getEnd()));
+      String expr = p.getText().substring(_span.getStart(), _span.getEnd());
+
+      // if (expr.indexOf(" ")>0)
+      LemmaPairs.add(new LemmaPair(subparses[pi].getType(), expr, _span
+          .getStart()));
+      if (!((Parse) subparses[pi]).isPosTag())
+        LemmaPairs.addAll(getAllPhrasesTWPairs(subparses[pi]));
+    }
+
+    return LemmaPairs;
+  }
+
+  protected List<List<ParseTreeChunk>> matchOrigSentences(String sent1,
+      String sent2) {
+    // with tokenizer now
+    Parse[] parses1 = parseLine(sent1, parser, 1);
+    Parse[] parses2 = parseLine(sent2, parser, 1);
+    List<LemmaPair> origChunks1 = getAllPhrasesTWPairs(parses1[0]);
+    List<LemmaPair> origChunks2 = getAllPhrasesTWPairs(parses2[0]);
+    System.out.println(origChunks1);
+    System.out.println(origChunks2);
+
+    ParseTreeChunk matcher = new ParseTreeChunk();
+    List<List<ParseTreeChunk>> matchResult = matcher
+        .matchTwoSentencesGivenPairLists(origChunks1, origChunks2);
+    return matchResult;
+  }
+
+  public List<List<ParseTreeChunk>> matchOrigSentencesCache(String sent1,
+      String sent2) {
+    sent1 = sent1.replace("'s", " 's").replace(":", " ");
+    sent2 = sent2.replace("'s", " 's").replace(":", " ");
+
+    ParseTreeChunk matcher = new ParseTreeChunk();
+    List<List<ParseTreeChunk>> sent1GrpLst = null, sent2GrpLst = null;
+
+    sent1GrpLst = parsingsCache.get(sent1);
+    if (sent1GrpLst == null) {
+      List<LemmaPair> origChunks1 = new ArrayList<LemmaPair>();
+      String[] sents1 = sentenceDetectorME.sentDetect(sent1);
+      for (String s1 : sents1) {
+        Parse[] parses1 = parseLine(s1, parser, 1);
+        origChunks1.addAll(getAllPhrasesTWPairs(parses1[0]));
+      }
+      List<ParseTreeChunk> chunk1List = matcher.buildChunks(origChunks1);
+      sent1GrpLst = matcher.groupChunksAsParses(chunk1List);
+      parsingsCache.put(sent1, sent1GrpLst);
+      System.out.println(origChunks1);
+      // System.out.println("=== Grouped chunks 1 "+ sent1GrpLst);
+    }
+    sent2GrpLst = parsingsCache.get(sent2);
+    if (sent2GrpLst == null) {
+      List<LemmaPair> origChunks2 = new ArrayList<LemmaPair>();
+      String[] sents2 = sentenceDetectorME.sentDetect(sent2);
+      for (String s2 : sents2) {
+        Parse[] parses2 = parseLine(s2, parser, 1);
+        origChunks2.addAll(getAllPhrasesTWPairs(parses2[0]));
+      }
+      List<ParseTreeChunk> chunk2List = matcher.buildChunks(origChunks2);
+      sent2GrpLst = matcher.groupChunksAsParses(chunk2List);
+      parsingsCache.put(sent2, sent2GrpLst);
+      System.out.println(origChunks2);
+      // System.out.println("=== Grouped chunks 2 "+ sent2GrpLst);
+    }
+
+    return parseTreeMatcherDeterministic
+        .matchTwoSentencesGroupedChunksDeterministic(sent1GrpLst, sent2GrpLst);
+
+  }
+
+  public SentencePairMatchResult assessRelevance(String minedSent1, String sent2) {
+    minedSent1 = minedSent1.replace("'s", " 's").replace(":", " ")
+        .replace("âs", " 's");
+    sent2 = sent2.replace("'s", " 's").replace(":", " ").replace("âs", " 's");
+
+    ParseTreeChunk matcher = new ParseTreeChunk();
+    List<List<ParseTreeChunk>> sent1GrpLst = null, sent2GrpLst = null;
+
+    // sent1GrpLst = parsingsCache.get(minedSent1);
+    // if (sent1GrpLst==null){
+    List<LemmaPair> origChunks1 = new ArrayList<LemmaPair>();
+    String[] sents1 = sentenceDetectorME.sentDetect(minedSent1);
+    for (String s1 : sents1) {
+      Parse[] parses1 = parseLine(s1, parser, 1);
+      origChunks1.addAll(getAllPhrasesTWPairs(parses1[0]));
+    }
+    List<ParseTreeChunk> chunk1List = matcher.buildChunks(origChunks1);
+    sent1GrpLst = matcher.groupChunksAsParses(chunk1List);
+    parsingsCache.put(minedSent1, sent1GrpLst);
+    // System.out.println(origChunks1);
+    // System.out.println("=== Grouped chunks 1 "+ sent1GrpLst);
+    // }
+    sent2GrpLst = parsingsCache.get(sent2);
+    if (sent2GrpLst == null) {
+      List<LemmaPair> origChunks2 = new ArrayList<LemmaPair>();
+      String[] sents2 = sentenceDetectorME.sentDetect(sent2);
+      for (String s2 : sents2) {
+        Parse[] parses2 = parseLine(s2, parser, 1);
+        origChunks2.addAll(getAllPhrasesTWPairs(parses2[0]));
+      }
+      List<ParseTreeChunk> chunk2List = matcher.buildChunks(origChunks2);
+      sent2GrpLst = matcher.groupChunksAsParses(chunk2List);
+      parsingsCache.put(sent2, sent2GrpLst);
+      // System.out.println(origChunks2);
+      // System.out.println("=== Grouped chunks 2 "+ sent2GrpLst);
+    }
+
+    ParseTreeMatcherDeterministic md = new ParseTreeMatcherDeterministic();
+    List<List<ParseTreeChunk>> res = md
+        .matchTwoSentencesGroupedChunksDeterministic(sent1GrpLst, sent2GrpLst);
+    return new SentencePairMatchResult(res, origChunks1);
+
+  }
+
+  public Map<String, List<LemmaPair>> findMappingBetweenSentencesOfAParagraphAndAClassReps(
+      String para1, String classStr) {
+    // profile of matches
+    List<List<List<ParseTreeChunk>>> matchResultPerSentence = new ArrayList<List<List<ParseTreeChunk>>>();
+
+    ParseTreeChunk matcher = new ParseTreeChunk();
+
+    String[] sents = sentenceDetectorME.sentDetect(para1);
+    String[] classSents = sentenceDetectorME.sentDetect(classStr);
+
+    List<List<LemmaPair>> parseSentList = new ArrayList<List<LemmaPair>>();
+    for (String s : sents) {
+      parseSentList.add(getAllPhrasesTWPairs((parseLine(s, parser, 1)[0])));
+    }
+
+    List<List<LemmaPair>> parseClassList = new ArrayList<List<LemmaPair>>();
+    for (String s : classSents) {
+      parseClassList.add(getAllPhrasesTWPairs((parseLine(s, parser, 1)[0])));
+    }
+
+    Map<String, List<LemmaPair>> sentence_bestClassRep = new HashMap<String, List<LemmaPair>>();
+    for (List<LemmaPair> chunksSent : parseSentList) {
+      Double maxScore = -1.0;
+      for (List<LemmaPair> chunksClass : parseClassList) {
+        List<List<ParseTreeChunk>> matchResult = matcher
+            .matchTwoSentencesGivenPairLists(chunksSent, chunksClass);
+        Double score = parseTreeChunkListScorer
+            .getParseTreeChunkListScore(matchResult);
+        if (score > maxScore) {
+          maxScore = score;
+          sentence_bestClassRep.put(chunksSent.toString(), chunksClass);
+        }
+      }
+    }
+    return sentence_bestClassRep;
+  }
+
+  public SentenceDetectorME getSentenceDetectorME() {
+    return sentenceDetectorME;
+  }
+
+  public Parser getParser() {
+    return parser;
+  }
+}
+
+// -Xms500M -Xmx500M

Propchange: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/ParserChunker2MatcherOlderOpenNLP.java.txt
------------------------------------------------------------------------------
    svn:mime-type = text/plain

Modified: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/TextProcessor.java
URL: http://svn.apache.org/viewvc/incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/TextProcessor.java?rev=1187056&r1=1187055&r2=1187056&view=diff
==============================================================================
--- incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/TextProcessor.java (original)
+++ incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/TextProcessor.java Thu Oct 20 21:28:45 2011
@@ -31,11 +31,12 @@ import java.util.Map;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 
+import opennlp.tools.similarity.apps.utils.Pair;
+
 import org.apache.commons.lang.StringUtils;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-import com.zvents.ce.common.util.Pair;
 
 public class TextProcessor {
 
@@ -279,7 +280,12 @@ public class TextProcessor {
 
     return retVal;
   }
-
+  
+  public static String removePunctuation(String sentence){
+	  List<String> toks = fastTokenize( sentence, false);
+	  return toks.toString().replace('[', ' ').replace(']', ' ').replace(',', ' ').replace("  ", " ");
+  }
+  
   public static ArrayList<String> fastTokenize(String txt, boolean retainPunc) {
     ArrayList<String> tokens = new ArrayList<String>();
     if (StringUtils.isEmpty(txt)) {
@@ -484,10 +490,7 @@ public class TextProcessor {
       }
     }
 
-    Stemmer st = new Stemmer();
-    st.add(token.toCharArray(), token.length());
-    st.stem();
-    return st.toString();
+    return new PorterStemmer().stem(token);
   }
 
   public static String cleanToken(String token) {
@@ -532,10 +535,9 @@ public class TextProcessor {
 
   public static String stemTerm(String term) {
     term = stripToken(term);
-    Stemmer st = new Stemmer();
-    st.add(term.toCharArray(), term.length());
-    st.stem();
-    return st.toString();
+    PorterStemmer st = new PorterStemmer();
+    
+    return st.stem(term);
   }
 
   public static String generateFingerPrint(String s) {

Added: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/chunker2matcher/ParserChunker2MatcherProcessor.java
URL: http://svn.apache.org/viewvc/incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/chunker2matcher/ParserChunker2MatcherProcessor.java?rev=1187056&view=auto
==============================================================================
--- incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/chunker2matcher/ParserChunker2MatcherProcessor.java (added)
+++ incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/chunker2matcher/ParserChunker2MatcherProcessor.java Thu Oct 20 21:28:45 2011
@@ -0,0 +1,651 @@
+package opennlp.tools.textsimilarity.chunker2matcher;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.logging.Level;
+import java.util.logging.Logger;
+
+import org.apache.commons.lang.StringUtils;
+
+import opennlp.tools.chunker.ChunkerME;
+import opennlp.tools.chunker.ChunkerModel;
+import opennlp.tools.cmdline.parser.ParserTool;
+import opennlp.tools.parser.AbstractBottomUpParser;
+import opennlp.tools.parser.Parse;
+import opennlp.tools.parser.Parser;
+import opennlp.tools.parser.ParserFactory;
+import opennlp.tools.parser.ParserModel;
+import opennlp.tools.postag.POSModel;
+import opennlp.tools.postag.POSTagger;
+import opennlp.tools.postag.POSTaggerME;
+import opennlp.tools.sentdetect.SentenceDetector;
+import opennlp.tools.sentdetect.SentenceDetectorME;
+import opennlp.tools.sentdetect.SentenceModel;
+import opennlp.tools.textsimilarity.LemmaPair;
+import opennlp.tools.textsimilarity.ParseTreeChunk;
+import opennlp.tools.textsimilarity.ParseTreeMatcherDeterministic;
+import opennlp.tools.textsimilarity.SentencePairMatchResult;
+import opennlp.tools.textsimilarity.TextProcessor;
+import opennlp.tools.tokenize.Tokenizer;
+import opennlp.tools.tokenize.TokenizerME;
+import opennlp.tools.tokenize.TokenizerModel;
+import opennlp.tools.util.Sequence;
+import opennlp.tools.util.Span;
+import opennlp.tools.util.StringUtil;
+
+public class ParserChunker2MatcherProcessor {
+	private static final int MIN_SENTENCE_LENGTH = 10;
+	private static final String MODEL_DIR_KEY = "nlp.models.dir";
+	private static final String MODEL_DIR ;
+	private static ParserChunker2MatcherProcessor instance;
+
+	private SentenceDetector sentenceDetector;
+	private Tokenizer tokenizer;
+	private POSTagger posTagger;
+	private Parser parser;
+	private ChunkerME chunker;
+	private final int NUMBER_OF_SECTIONS_IN_SENTENCE_CHUNKS = 5;
+	Logger logger = Logger.getLogger("opennlp.tools.textsimilarity.chunker2matcher.ParserChunker2MatcherProcessor");
+
+	static {
+		//TODO config
+		MODEL_DIR = "C:\\workspace\\similarity\\src\\main\\resources";
+	}
+
+	private ParserChunker2MatcherProcessor() {
+		initializeSentenceDetector();
+		initializeTokenizer();
+		initializePosTagger();
+		initializeParser();
+		initializeChunker();
+		
+	}
+
+	public synchronized static ParserChunker2MatcherProcessor getInstance() {
+		if (instance == null)
+			instance = new ParserChunker2MatcherProcessor();
+
+		return instance;
+	}
+
+	public List<List<Parse>> parseTextNlp(String text) {
+		if (text == null || text.trim().length() == 0)
+			return null;
+
+		List<List<Parse>> textParses = new ArrayList<List<Parse>>(1);
+
+		// parse paragraph by paragraph
+		String[] paragraphList = splitParagraph(text);
+		for (String paragraph : paragraphList) {
+			if (paragraph.length() == 0)
+				continue;
+
+			List<Parse> paragraphParses = parseParagraphNlp(paragraph);
+			if (paragraphParses != null)
+				textParses.add(paragraphParses);
+		}
+
+		return textParses;
+	}
+
+	public List<Parse> parseParagraphNlp(String paragraph) {
+		if (paragraph == null || paragraph.trim().length() == 0)
+			return null;
+
+		// normalize the text before parsing, otherwise, the sentences may not
+		// be
+		// separated correctly
+
+		//paragraph = TextNormalizer.normalizeText(paragraph);
+
+		// parse sentence by sentence
+		String[] sentences = splitSentences(paragraph);
+		List<Parse> parseList = new ArrayList<Parse>(sentences.length);
+		for (String sentence : sentences) {
+			sentence = sentence.trim();
+			if (sentence.length() == 0)
+				continue;
+
+			Parse sentenceParse = parseSentenceNlp(sentence, false);
+			if (sentenceParse != null)
+				parseList.add(sentenceParse);
+		}
+
+		return parseList;
+	}
+
+	public Parse parseSentenceNlp(String sentence) {
+		// if we parse an individual sentence, we want to normalize the text
+		// before parsing
+		return parseSentenceNlp(sentence, true);
+	}
+
+	public synchronized Parse parseSentenceNlp(String sentence,
+			boolean normalizeText) {
+		// don't try to parse very short sentence, not much info in it anyway,
+		// most likely a heading
+		if (sentence == null || sentence.trim().length() < MIN_SENTENCE_LENGTH)
+			return null;
+
+		//if (normalizeText)
+		//	sentence = TextNormalizer.normalizeText(sentence);
+
+		Parse[] parseArray = null;
+		try {
+			parseArray = ParserTool.parseLine(sentence, parser, 1);
+		} catch (Throwable t) {
+			logger.log(Level.WARNING, "failed to parse the sentence : '"+sentence, t);
+			return null;
+		}
+
+		// there should be only one result parse
+		if (parseArray != null && parseArray.length > 0)
+			return parseArray[0];
+		else
+			return null;
+	}
+
+
+	public synchronized List<List<ParseTreeChunk>> formGroupedPhrasesFromChunksForPara(String para){
+		List<List<ParseTreeChunk>> listOfChunksAccum = new ArrayList<List<ParseTreeChunk>>();
+		String[] sentences = splitSentences(para);
+		for(String sent: sentences){
+			List<List<ParseTreeChunk>> singleSentChunks = formGroupedPhrasesFromChunksForSentence(sent); 
+			if (listOfChunksAccum.size()<1 ){
+				listOfChunksAccum = new ArrayList<List<ParseTreeChunk>>(singleSentChunks);
+			} else 
+				for(int i= 0; i<NUMBER_OF_SECTIONS_IN_SENTENCE_CHUNKS; i++){
+					//make sure not null
+					if (singleSentChunks.size()!=NUMBER_OF_SECTIONS_IN_SENTENCE_CHUNKS)
+						break;
+					List<ParseTreeChunk> phraseI = singleSentChunks.get(i);
+					List<ParseTreeChunk> phraseIaccum  = listOfChunksAccum.get(i);
+					phraseIaccum.addAll(phraseI);
+					listOfChunksAccum.set(i, phraseIaccum);
+				}
+		}
+
+		return listOfChunksAccum;
+	}
+
+
+	public synchronized List<List<ParseTreeChunk>> formGroupedPhrasesFromChunksForSentence(String sentence) {
+		if (sentence == null || sentence.trim().length() < MIN_SENTENCE_LENGTH)
+			return null;
+
+		sentence = TextProcessor.removePunctuation(sentence);
+
+		String[] toks = tokenizer.tokenize(sentence);
+		String[] tags = posTagger.tag(toks);
+		String[] res = chunker.chunk(toks, tags);
+		Span[] span =  chunker.chunkAsSpans(toks, tags);
+		Sequence[] seq = chunker.topKSequences(toks, tags);
+
+		// correction for chunking tags
+		for(int i=0; i< toks.length; i++){
+			if (toks[i].equalsIgnoreCase("is")){
+				res[i] = "B-VP";
+			}
+		}
+
+		List<List<ParseTreeChunk>> listOfChunks = new ArrayList<List<ParseTreeChunk>>();
+		List<ParseTreeChunk> nounPhr = new ArrayList<ParseTreeChunk>(), 
+		prepPhr = new ArrayList<ParseTreeChunk>(), verbPhr  = new ArrayList<ParseTreeChunk>(), 
+		adjPhr  = new ArrayList<ParseTreeChunk>(), 
+		// to store the whole sentence
+		wholeSentence = new ArrayList<ParseTreeChunk>();
+
+		List<String> pOSsAll = new ArrayList<String>(), lemmasAll = new ArrayList<String>();
+
+		for(int i = 0; i< toks.length; i++){
+			pOSsAll.add(tags[i]);
+			lemmasAll.add(toks[i]);
+		}
+		wholeSentence.add(new ParseTreeChunk("SENTENCE", lemmasAll, pOSsAll));
+
+		boolean currPhraseClosed = false;
+		for(int i=0; i< res.length; i++){
+			String bi_POS = res[i];
+			currPhraseClosed = false;
+			if (bi_POS.startsWith("B-NP")){// beginning of a phrase
+
+				List<String> pOSs = new ArrayList<String>(), lemmas = new ArrayList<String>();
+				pOSs.add(tags[i]);
+				lemmas.add(toks[i]);
+				for(int j=i+1; j<res.length; j++){
+					if (res[j].startsWith("B-VP")){
+						nounPhr.add(new ParseTreeChunk("NP", lemmas, pOSs));
+						logger.info(i + " => " +lemmas);
+						currPhraseClosed = true;
+						break;
+					} else {
+						pOSs.add(tags[j]);
+						lemmas.add(toks[j]);
+					}
+				}
+				if (!currPhraseClosed){
+					nounPhr.add(new ParseTreeChunk("NP", lemmas, pOSs));
+					logger.info(i + " => " + lemmas);
+				}
+
+			} else if (bi_POS.startsWith("B-PP")){// beginning of a phrase
+				List<String> pOSs = new ArrayList<String>(), lemmas = new ArrayList<String>();
+				pOSs.add(tags[i]);
+				lemmas.add(toks[i]);
+
+				for(int j=i+1; j<res.length; j++){
+					if (res[j].startsWith("B-VP")){
+						prepPhr.add(new ParseTreeChunk("PP", lemmas, pOSs));
+						logger.info(i + " => " + lemmas);
+						currPhraseClosed = true;
+						break;
+					} else {
+						pOSs.add(tags[j]);
+						lemmas.add(toks[j]);
+					}
+				}
+				if (!currPhraseClosed){
+					prepPhr.add(new ParseTreeChunk("PP", lemmas, pOSs));
+					logger.info(i + " => " + lemmas);
+				}
+			} else
+				if (bi_POS.startsWith("B-VP")){// beginning of a phrase
+					List<String> pOSs = new ArrayList<String>(), lemmas = new ArrayList<String>();
+					pOSs.add(tags[i]);
+					lemmas.add(toks[i]);
+
+					for(int j=i+1; j<res.length; j++){
+						if (res[j].startsWith("B-VP")){
+							verbPhr.add(new ParseTreeChunk("VP", lemmas, pOSs));
+							logger.info(i + " => " +lemmas);
+							currPhraseClosed = true;
+							break;
+						} else {
+							pOSs.add(tags[j]);
+							lemmas.add(toks[j]);
+						}
+					}
+					if (!currPhraseClosed){
+						verbPhr.add(new ParseTreeChunk("VP", lemmas, pOSs));
+						logger.info(i + " => " + lemmas);
+					}
+				} else
+					if (bi_POS.startsWith("B-ADJP") ){// beginning of a phrase
+						List<String> pOSs = new ArrayList<String>(), lemmas = new ArrayList<String>();
+						pOSs.add(tags[i]);
+						lemmas.add(toks[i]);
+
+						for(int j=i+1; j<res.length; j++){
+							if (res[j].startsWith("B-VP")){
+								adjPhr.add(new ParseTreeChunk("ADJP", lemmas, pOSs));
+								logger.info(i + " => " +lemmas);
+								currPhraseClosed = true;
+								break;
+							} else {
+								pOSs.add(tags[j]);
+								lemmas.add(toks[j]);
+							}
+						}
+						if (!currPhraseClosed){
+							adjPhr.add(new ParseTreeChunk("ADJP", lemmas, pOSs));
+							logger.info(i + " => " + lemmas);
+						}
+					}
+		}
+		listOfChunks.add(nounPhr);
+		listOfChunks.add(verbPhr);
+		listOfChunks.add(prepPhr);
+		listOfChunks.add(adjPhr);
+		listOfChunks.add(wholeSentence);
+
+		return listOfChunks;
+	}
+
+	public static List<List<SentenceNode>> textToSentenceNodes(
+			List<List<Parse>> textParses) {
+		if (textParses == null || textParses.size() == 0)
+			return null;
+
+		List<List<SentenceNode>> textNodes = new ArrayList<List<SentenceNode>>(
+				textParses.size());
+		for (List<Parse> paragraphParses : textParses) {
+			List<SentenceNode> paragraphNodes = paragraphToSentenceNodes(paragraphParses);
+
+			// append paragraph node if any
+			if (paragraphNodes != null && paragraphNodes.size() > 0)
+				textNodes.add(paragraphNodes);
+		}
+
+		if (textNodes.size() > 0)
+			return textNodes;
+		else
+			return null;
+	}
+
+	public static List<SentenceNode> paragraphToSentenceNodes(
+			List<Parse> paragraphParses) {
+		if (paragraphParses == null || paragraphParses.size() == 0)
+			return null;
+
+		List<SentenceNode> paragraphNodes = new ArrayList<SentenceNode>(
+				paragraphParses.size());
+		for (Parse sentenceParse : paragraphParses) {
+			SentenceNode sentenceNode = null;
+			try {
+				sentenceNode = sentenceToSentenceNode(sentenceParse);
+			} catch (Exception e) {
+				// don't fail the whole paragraph when a single sentence fails
+				System.err.println("Faile to convert sentence to node. error: " + e);
+				sentenceNode = null;
+			}
+
+			if (sentenceNode != null)
+				paragraphNodes.add(sentenceNode);
+		}
+
+		if (paragraphNodes.size() > 0)
+			return paragraphNodes;
+		else
+			return null;
+	}
+
+	public static SentenceNode sentenceToSentenceNode(Parse sentenceParse) {
+		if (sentenceParse == null)
+			return null;
+
+		// convert the OpenNLP Parse to our own tree nodes
+		SyntacticTreeNode node = toSyntacticTreeNode(sentenceParse);
+		if ((node == null) || !(node instanceof SentenceNode))
+			return null;
+
+		SentenceNode sentenceNode = (SentenceNode) node;
+
+		// fix the parsing tree
+		fixParsingTree(sentenceNode);
+
+		return sentenceNode;
+	}
+
+	public List<List<SentenceNode>> parseTextNode(String text) {
+		List<List<Parse>> textParseList = parseTextNlp(text);
+		return textToSentenceNodes(textParseList);
+	}
+
+	public List<SentenceNode> parseParagraphNode(String paragraph) {
+		List<Parse> paragraphParseList = parseParagraphNlp(paragraph);
+		return paragraphToSentenceNodes(paragraphParseList);
+	}
+
+	public SentenceNode parseSentenceNode(String sentence) {
+		return parseSentenceNode(sentence, true);
+	}
+
+	public synchronized SentenceNode parseSentenceNode(String sentence,
+			boolean normalizeText) {
+		Parse sentenceParse = parseSentenceNlp(sentence, normalizeText);
+		return sentenceToSentenceNode(sentenceParse);
+	}
+
+	public String[] splitParagraph(String text) {
+		String[] res = text.split("\n");
+		if (res == null || res.length<=1)
+			return new String[] {text};
+		else 
+			return res;
+
+	}
+
+	public String[] splitSentences(String text) {
+		if (text == null)
+			return null;
+
+		return sentenceDetector.sentDetect(text);
+	}
+
+	public String[] tokenizeSentence(String sentence) {
+		if (sentence == null)
+			return null;
+
+		return tokenizer.tokenize(sentence);
+	}
+
+	private void initializeSentenceDetector() {
+		InputStream is = null;
+		try {
+			is = new FileInputStream(
+					MODEL_DIR + "/en-sent.bin"
+
+			);
+			SentenceModel model = new SentenceModel(is);
+			sentenceDetector = new SentenceDetectorME(model);
+		} catch (IOException e) {
+			e.printStackTrace();
+		} finally {
+			if (is != null) {
+				try {
+					is.close();
+				} catch (IOException e) {
+					e.printStackTrace();
+				}
+			}
+		}
+	}
+
+	private void initializeTokenizer() {
+		InputStream is = null;
+		try {
+			is = new FileInputStream(
+					MODEL_DIR+ "/en-token.bin"
+			);
+			TokenizerModel model = new TokenizerModel(is);
+			tokenizer = new TokenizerME(model);
+		} catch (IOException e) {
+			e.printStackTrace();
+		} finally {
+			if (is != null) {
+				try {
+					is.close();
+				} catch (IOException e) {
+				}
+			}
+		}
+	}
+
+	private void initializePosTagger() {
+		InputStream is = null;
+		try {
+			is = new FileInputStream(MODEL_DIR
+					+ "/en-pos-maxent.bin");
+			POSModel model = new POSModel(is);
+			posTagger = new POSTaggerME(model);
+		} catch (IOException e) {
+			e.printStackTrace();
+		} finally {
+			if (is != null) {
+				try {
+					is.close();
+				} catch (IOException e) {
+				}
+			}
+		}
+	}
+
+	private void initializeParser() {
+		InputStream is = null;
+		try {
+			is = new FileInputStream(MODEL_DIR
+					+ "/en-parser-chunking.bin");
+			ParserModel model = new ParserModel(is);
+			parser = ParserFactory.create(model);
+		} catch (IOException e) {
+			e.printStackTrace();
+		} finally {
+			if (is != null) {
+				try {
+					is.close();
+				} catch (IOException e) {
+				}
+			}
+		}
+	}
+
+	private void initializeChunker() {
+		InputStream is = null;
+		try {
+			is = new FileInputStream(MODEL_DIR
+					+ "/en-chunker.bin");
+			ChunkerModel model = new ChunkerModel(is);
+			chunker = new ChunkerME(model);
+		} catch (IOException e) {
+			e.printStackTrace();
+		} finally {
+			if (is != null) {
+				try {
+					is.close();
+				} catch (IOException e) {
+				}
+			}
+		}
+	}
+
+	/**
+	 * convert an instance of Parse to SyntacticTreeNode, by filtering out the
+	 * unnecessary data and assigning the word for each node
+	 * 
+	 * @param parse
+	 */
+	private static SyntacticTreeNode toSyntacticTreeNode(Parse parse) {
+		if (parse == null)
+			return null;
+
+		// check for junk types
+		String type = parse.getType();
+		if (SyntacticTreeNode.isJunkType(type))
+			return null;
+
+		String text = parse.getText();
+		ArrayList<SyntacticTreeNode> childrenNodeList = convertChildrenNodes(parse);
+
+		// check sentence node, the node contained in the top node
+		if (type.equals(AbstractBottomUpParser.TOP_NODE)
+				&& childrenNodeList != null && childrenNodeList.size() > 0) {
+			PhraseNode rootNode = (PhraseNode) childrenNodeList.get(0);
+			return new SentenceNode(text, rootNode.getChildren());
+		}
+
+		// if this node contains children nodes, then it is a phrase node
+		if (childrenNodeList != null && childrenNodeList.size() > 0) {
+			return new PhraseNode(type, childrenNodeList);
+		}
+
+		// otherwise, it is a word node
+		Span span = parse.getSpan();
+		String word = text.substring(span.getStart(), span.getEnd()).trim();
+
+		return new WordNode(type, word);
+	}
+
+	private static ArrayList<SyntacticTreeNode> convertChildrenNodes(Parse parse) {
+		if (parse == null)
+			return null;
+
+		Parse[] children = parse.getChildren();
+		if (children == null || children.length == 0)
+			return null;
+
+		ArrayList<SyntacticTreeNode> childrenNodeList = new ArrayList<SyntacticTreeNode>();
+		for (Parse child : children) {
+			SyntacticTreeNode childNode = toSyntacticTreeNode(child);
+			if (childNode != null)
+				childrenNodeList.add(childNode);
+		}
+
+		return childrenNodeList;
+	}
+
+	private static void fixParsingTree(SentenceNode sentenceNode) {
+		// logger.finest("before = " + sentenceNode);
+		//	for (ParsingTreeFixer fixer : FIXER_LIST) {
+		//		fixer.fix(sentenceNode);
+		//	}
+		// logger.finest("after = " + sentenceNode);
+	}
+
+	public SentencePairMatchResult assessRelevance(String para1, String para2)
+	{
+		ParserChunker2MatcherProcessor parser = ParserChunker2MatcherProcessor.getInstance();
+		List<List<ParseTreeChunk>> sent1GrpLst = parser.formGroupedPhrasesFromChunksForPara(para1), 
+		sent2GrpLst = parser.formGroupedPhrasesFromChunksForPara(para2);
+
+		List<LemmaPair> origChunks1 = listListParseTreeChunk2ListLemmaPairs(sent1GrpLst); //TODO  need to populate it!
+
+
+		ParseTreeMatcherDeterministic md = new ParseTreeMatcherDeterministic();
+		List<List<ParseTreeChunk>> res = md.matchTwoSentencesGroupedChunksDeterministic(sent1GrpLst, sent2GrpLst);
+		return new SentencePairMatchResult(res, origChunks1);
+
+	}
+	private List<LemmaPair> listListParseTreeChunk2ListLemmaPairs(
+			List<List<ParseTreeChunk>> sent1GrpLst) {
+		List<ParseTreeChunk> wholeSentence = sent1GrpLst.get(sent1GrpLst.size()-1); // whole sentence is last list in the list of lists
+		List<LemmaPair>  results = new ArrayList<LemmaPair>();
+		List<String> pOSs = wholeSentence.get(0).getPOSs();
+		List<String> lemmas = wholeSentence.get(0).getLemmas();
+		for(int i= 0; i< lemmas.size(); i++){
+			results.add(new LemmaPair( pOSs.get(i), lemmas.get(i), i  ));
+		}
+
+		return results;
+	}
+
+	public static void main(String[] args) throws Exception {
+
+
+
+		/*
+		 * String text =
+		 * "I have been driving a 96 accord to death for 10 years.  " +
+		 * "Lately it has been costing to much in repairs.  " +
+		 * "I am looking for something 8,000-13,000.  " +
+		 * "My last three vehicles have been Accords.  " +
+		 * "I like them but I would like something different this time.";
+		 */
+		/*
+		 * String text = "I love Fresh body styling. " + "I love lots of grip. "
+		 * + "I love strong engine and grippy tires. " + "I like Head turner. "
+		 * + "I like Right and left rearward blind spots. " +
+		 * "I like Great acceleration. " + "I like great noise. " +
+		 * "I like great brakes. " + "I like cheap feeling interior. " +
+		 * "I like uncomfortable seats. " + "I like nav system hard to read.";
+		 */
+		// String sentence = "I love Fresh body styling";
+		// String phrase = "I captures way more detail in high contrast scenes";
+		String phrase1 = "Its classy design and the Mercedes name make it a very cool vehicle to drive. "
+			+ "The engine makes it a powerful car. "
+			+ "The strong engine gives it enough power. "
+			+ "The strong engine gives the car a lot of power.";
+		String phrase2 = "This car has a great engine. "
+			+ "This car has an amazingly good engine. "
+			+ "This car provides you a very good mileage.";
+		String sentence = "Not to worry with the 2cv.";
+		ParserChunker2MatcherProcessor parser = ParserChunker2MatcherProcessor.getInstance();
+
+		System.out.println(parser.assessRelevance(phrase1, phrase2));
+
+
+		parser.formGroupedPhrasesFromChunksForSentence("Its classy design and the Mercedes name make it a very cool vehicle to drive. ");
+		parser.formGroupedPhrasesFromChunksForSentence("Sounds too good to be true but it actually is, the world's first flying car is finally here. ");
+		parser.formGroupedPhrasesFromChunksForSentence("UN Ambassador Ron Prosor repeated the Israeli position that the only way the Palestinians will get UN membership and statehood is through direct negotiations with the Israelis on a comprehensive peace agreement");
+
+		List<List<SentenceNode>> nodeListList = parser.parseTextNode(phrase1);
+		for (List<SentenceNode> nodeList : nodeListList) {
+			for (SentenceNode node : nodeList) {
+				System.out.println(node);
+			}
+		}
+	}
+}

Propchange: incubator/opennlp/sandbox/opennlp-similarity/src/main/java/opennlp/tools/textsimilarity/chunker2matcher/ParserChunker2MatcherProcessor.java
------------------------------------------------------------------------------
    svn:mime-type = text/plain