You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by da...@apache.org on 2017/12/27 15:04:06 UTC

[13/54] [abbrv] lucene-solr:jira/solr-11702: LUCENE-2899: Add OpenNLP Analysis capabilities as a module

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/lucene/licenses/opennlp-tools-LICENSE-ASL.txt
----------------------------------------------------------------------
diff --git a/lucene/licenses/opennlp-tools-LICENSE-ASL.txt b/lucene/licenses/opennlp-tools-LICENSE-ASL.txt
new file mode 100644
index 0000000..d645695
--- /dev/null
+++ b/lucene/licenses/opennlp-tools-LICENSE-ASL.txt
@@ -0,0 +1,202 @@
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/lucene/licenses/opennlp-tools-NOTICE.txt
----------------------------------------------------------------------
diff --git a/lucene/licenses/opennlp-tools-NOTICE.txt b/lucene/licenses/opennlp-tools-NOTICE.txt
new file mode 100644
index 0000000..68a08dc
--- /dev/null
+++ b/lucene/licenses/opennlp-tools-NOTICE.txt
@@ -0,0 +1,6 @@
+
+Apache OpenNLP Tools
+Copyright 2015 The Apache Software Foundation
+
+This product includes software developed at
+The Apache Software Foundation (http://www.apache.org/).

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/lucene/module-build.xml
----------------------------------------------------------------------
diff --git a/lucene/module-build.xml b/lucene/module-build.xml
index d48ae37..c2159b6 100644
--- a/lucene/module-build.xml
+++ b/lucene/module-build.xml
@@ -285,6 +285,28 @@
     <property name="analyzers-icu-javadocs.uptodate" value="true"/>
   </target>
 
+  <property name="analyzers-opennlp.jar" value="${common.dir}/build/analysis/opennlp/lucene-analyzers-opennlp-${version}.jar"/>
+  <target name="check-analyzers-opennlp-uptodate" unless="analyzers-opennlp.uptodate">
+    <module-uptodate name="analysis/opennlp" jarfile="${analyzers-opennlp.jar}" property="analyzers-opennlp.uptodate"/>
+  </target>
+  <target name="jar-analyzers-opennlp" unless="analyzers-opennlp.uptodate" depends="check-analyzers-opennlp-uptodate">
+    <ant dir="${common.dir}/analysis/opennlp" target="jar-core" inheritAll="false">
+      <propertyset refid="uptodate.and.compiled.properties"/>
+    </ant>
+    <property name="analyzers-opennlp.uptodate" value="true"/>
+  </target>
+
+  <property name="analyzers-opennlp-javadoc.jar" value="${common.dir}/build/analysis/opennlp/lucene-analyzers-opennlp-${version}-javadoc.jar"/>
+  <target name="check-analyzers-opennlp-javadocs-uptodate" unless="analyzers-opennlp-javadocs.uptodate">
+    <module-uptodate name="analysis/opennlp" jarfile="${analyzers-opennlp-javadoc.jar}" property="analyzers-opennlp-javadocs.uptodate"/>
+  </target>
+  <target name="javadocs-analyzers-opennlp" unless="analyzers-opennlp-javadocs.uptodate" depends="check-analyzers-opennlp-javadocs-uptodate">
+    <ant dir="${common.dir}/analysis/opennlp" target="javadocs" inheritAll="false">
+      <propertyset refid="uptodate.and.compiled.properties"/>
+    </ant>
+    <property name="analyzers-opennlp-javadocs.uptodate" value="true"/>
+  </target>
+
   <property name="analyzers-phonetic.jar" value="${common.dir}/build/analysis/phonetic/lucene-analyzers-phonetic-${version}.jar"/>
   <target name="check-analyzers-phonetic-uptodate" unless="analyzers-phonetic.uptodate">
     <module-uptodate name="analysis/phonetic" jarfile="${analyzers-phonetic.jar}" property="analyzers-phonetic.uptodate"/>

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/lucene/test-framework/src/java/org/apache/lucene/analysis/BaseTokenStreamTestCase.java
----------------------------------------------------------------------
diff --git a/lucene/test-framework/src/java/org/apache/lucene/analysis/BaseTokenStreamTestCase.java b/lucene/test-framework/src/java/org/apache/lucene/analysis/BaseTokenStreamTestCase.java
index 070eab2..3e1e375 100644
--- a/lucene/test-framework/src/java/org/apache/lucene/analysis/BaseTokenStreamTestCase.java
+++ b/lucene/test-framework/src/java/org/apache/lucene/analysis/BaseTokenStreamTestCase.java
@@ -41,6 +41,7 @@ import org.apache.lucene.util.Attribute;
 import org.apache.lucene.util.AttributeFactory;
 import org.apache.lucene.util.AttributeImpl;
 import org.apache.lucene.util.AttributeReflector;
+import org.apache.lucene.util.BytesRef;
 import org.apache.lucene.util.BytesRefBuilder;
 import org.apache.lucene.util.IOUtils;
 import org.apache.lucene.util.IntsRef;
@@ -127,7 +128,7 @@ public abstract class BaseTokenStreamTestCase extends LuceneTestCase {
   //     lastStartOffset)
   public static void assertTokenStreamContents(TokenStream ts, String[] output, int startOffsets[], int endOffsets[], String types[], int posIncrements[],
                                                int posLengths[], Integer finalOffset, Integer finalPosInc, boolean[] keywordAtts,
-                                               boolean offsetsAreCorrect) throws IOException {
+                                               boolean offsetsAreCorrect, byte[][] payloads) throws IOException {
     assertNotNull(output);
     CheckClearAttributesAttribute checkClearAtt = ts.addAttribute(CheckClearAttributesAttribute.class);
     
@@ -166,6 +167,12 @@ public abstract class BaseTokenStreamTestCase extends LuceneTestCase {
       assertTrue("has no KeywordAttribute", ts.hasAttribute(KeywordAttribute.class));
       keywordAtt = ts.getAttribute(KeywordAttribute.class);
     }
+
+    PayloadAttribute payloadAtt = null;
+    if (payloads != null) {
+      assertTrue("has no PayloadAttribute", ts.hasAttribute(PayloadAttribute.class));
+      payloadAtt = ts.getAttribute(PayloadAttribute.class);
+    }
     
     // Maps position to the start/end offset:
     final Map<Integer,Integer> posToStartOffset = new HashMap<>();
@@ -185,6 +192,7 @@ public abstract class BaseTokenStreamTestCase extends LuceneTestCase {
       if (posIncrAtt != null) posIncrAtt.setPositionIncrement(45987657);
       if (posLengthAtt != null) posLengthAtt.setPositionLength(45987653);
       if (keywordAtt != null) keywordAtt.setKeyword((i&1) == 0);
+      if (payloadAtt != null) payloadAtt.setPayload(new BytesRef(new byte[] { 0x00, -0x21, 0x12, -0x43, 0x24 }));
       
       checkClearAtt.getAndResetClearCalled(); // reset it, because we called clearAttribute() before
       assertTrue("token "+i+" does not exist", ts.incrementToken());
@@ -209,7 +217,14 @@ public abstract class BaseTokenStreamTestCase extends LuceneTestCase {
       if (keywordAtts != null) {
         assertEquals("keywordAtt " + i + " term=" + termAtt, keywordAtts[i], keywordAtt.isKeyword());
       }
-      
+      if (payloads != null) {
+        if (payloads[i] != null) {
+          assertEquals("payloads " + i, new BytesRef(payloads[i]), payloadAtt.getPayload());
+        } else {
+          assertNull("payloads " + i, payloads[i]);
+        }
+      }
+
       // we can enforce some basic things about a few attributes even if the caller doesn't check:
       if (offsetAtt != null) {
         final int startOffset = offsetAtt.startOffset();
@@ -283,7 +298,9 @@ public abstract class BaseTokenStreamTestCase extends LuceneTestCase {
     if (typeAtt != null) typeAtt.setType("bogusType");
     if (posIncrAtt != null) posIncrAtt.setPositionIncrement(45987657);
     if (posLengthAtt != null) posLengthAtt.setPositionLength(45987653);
-    
+    if (keywordAtt != null) keywordAtt.setKeyword(true);
+    if (payloadAtt != null) payloadAtt.setPayload(new BytesRef(new byte[] { 0x00, -0x21, 0x12, -0x43, 0x24 }));
+
     checkClearAtt.getAndResetClearCalled(); // reset it, because we called clearAttribute() before
 
     ts.end();
@@ -305,7 +322,7 @@ public abstract class BaseTokenStreamTestCase extends LuceneTestCase {
   public static void assertTokenStreamContents(TokenStream ts, String[] output, int startOffsets[], int endOffsets[], String types[], int posIncrements[],
                                                int posLengths[], Integer finalOffset, boolean[] keywordAtts,
                                                boolean offsetsAreCorrect) throws IOException {
-    assertTokenStreamContents(ts, output, startOffsets, endOffsets, types, posIncrements, posLengths, finalOffset, null, null, offsetsAreCorrect);
+    assertTokenStreamContents(ts, output, startOffsets, endOffsets, types, posIncrements, posLengths, finalOffset, null, keywordAtts, offsetsAreCorrect, null);
   }
 
   public static void assertTokenStreamContents(TokenStream ts, String[] output, int startOffsets[], int endOffsets[], String types[], int posIncrements[], int posLengths[], Integer finalOffset, boolean offsetsAreCorrect) throws IOException {
@@ -373,7 +390,12 @@ public abstract class BaseTokenStreamTestCase extends LuceneTestCase {
     checkAnalysisConsistency(random(), a, true, input, offsetsAreCorrect);
     assertTokenStreamContents(a.tokenStream("dummy", input), output, startOffsets, endOffsets, types, posIncrements, posLengths, input.length(), offsetsAreCorrect);
   }
-  
+
+  public static void assertAnalyzesTo(Analyzer a, String input, String[] output, int startOffsets[], int endOffsets[], String types[], int posIncrements[], int posLengths[], boolean offsetsAreCorrect, byte[][] payloads) throws IOException {
+    checkResetException(a, input);
+    assertTokenStreamContents(a.tokenStream("dummy", input), output, startOffsets, endOffsets, types, posIncrements, posLengths, input.length(), null, null, offsetsAreCorrect, payloads);
+  }
+
   public static void assertAnalyzesTo(Analyzer a, String input, String[] output) throws IOException {
     assertAnalyzesTo(a, input, output, null, null, null, null, null);
   }

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/solr/CHANGES.txt
----------------------------------------------------------------------
diff --git a/solr/CHANGES.txt b/solr/CHANGES.txt
index a3f6f75..e60262d 100644
--- a/solr/CHANGES.txt
+++ b/solr/CHANGES.txt
@@ -53,6 +53,13 @@ New Features
 ----------------------
 * SOLR-11285: Simulation framework for autoscaling. (ab)
 
+* LUCENE-2899: In the Solr analysis-extras contrib, added support for the
+  OpenNLP-based analysis components in the Lucene analysis/opennlp module:
+  tokenization, part-of-speech tagging, phrase chunking, and lemmatization.
+  Also added OpenNLP-based named entity extraction as a Solr update request
+  processor.  (Lance Norskog, Grant Ingersoll, Joern Kottmann, Em, Kai Gülzau,
+  Rene Nederhand, Robert Muir, Steven Bower, Steve Rowe)
+
 Optimizations
 ----------------------
 

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/solr/contrib/analysis-extras/README.txt
----------------------------------------------------------------------
diff --git a/solr/contrib/analysis-extras/README.txt b/solr/contrib/analysis-extras/README.txt
index 3863420..fb8381a 100644
--- a/solr/contrib/analysis-extras/README.txt
+++ b/solr/contrib/analysis-extras/README.txt
@@ -1,8 +1,10 @@
 The analysis-extras plugin provides additional analyzers that rely
 upon large dependencies/dictionaries.
 
-It includes integration with ICU for multilingual support, and 
-analyzers for Chinese and Polish.
+It includes integration with ICU for multilingual support,
+analyzers for Chinese and Polish, and integration with
+OpenNLP for multilingual tokenization, part-of-speech tagging
+lemmatization, phrase chunking, and named-entity recognition.
 
 ICU relies upon lucene-libs/lucene-analyzers-icu-X.Y.jar
 and lib/icu4j-X.Y.jar
@@ -13,4 +15,6 @@ Stempel relies on lucene-libs/lucene-analyzers-stempel-X.Y.jar
 
 Morfologik relies on lucene-libs/lucene-analyzers-morfologik-X.Y.jar
 and lib/morfologik-*.jar
- 
+
+OpenNLP relies on lucene-libs/lucene-analyzers-opennlp-X.Y.jar
+and lib/opennlp-*.jar

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/solr/contrib/analysis-extras/build.xml
----------------------------------------------------------------------
diff --git a/solr/contrib/analysis-extras/build.xml b/solr/contrib/analysis-extras/build.xml
index 38d67dd..68a88ad 100644
--- a/solr/contrib/analysis-extras/build.xml
+++ b/solr/contrib/analysis-extras/build.xml
@@ -7,9 +7,9 @@
     The ASF licenses this file to You under the Apache License, Version 2.0
     the "License"); you may not use this file except in compliance with
     the License.  You may obtain a copy of the License at
- 
+
         http://www.apache.org/licenses/LICENSE-2.0
- 
+
     Unless required by applicable law or agreed to in writing, software
     distributed under the License is distributed on an "AS IS" BASIS,
     WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -24,19 +24,20 @@
   </description>
 
   <import file="../contrib-build.xml"/>
-  
+
   <target name="compile-test" depends="-compile-test-lucene-analysis,common-solr.compile-test"/>
 
   <path id="analysis.extras.lucene.libs">
     <pathelement location="${analyzers-icu.jar}"/>
-    <!-- 
-      Although the smartcn, stempel, and morfologik jars are not dependencies of
+    <!--
+      Although the smartcn, stempel, morfologik and opennlp jars are not dependencies of
       code in the analysis-extras contrib, they must remain here in order to
       populate the Solr distribution
      -->
     <pathelement location="${analyzers-smartcn.jar}"/>
     <pathelement location="${analyzers-stempel.jar}"/>
     <pathelement location="${analyzers-morfologik.jar}"/>
+    <pathelement location="${analyzers-opennlp.jar}"/>
   </path>
 
   <path id="classpath">
@@ -53,12 +54,12 @@
     </dirset>
   </path>
 
-  <!-- 
-    Although the smartcn, stempel, and morfologik jars are not dependencies of
+  <!--
+    Although the smartcn, stempel, morfologik and opennlp jars are not dependencies of
     code in the analysis-extras contrib, they must remain here in order to
     populate the Solr distribution
    -->
-  <target name="module-jars-to-solr" 
+  <target name="module-jars-to-solr"
           depends="-module-jars-to-solr-not-for-package,-module-jars-to-solr-package"/>
   <target name="-module-jars-to-solr-not-for-package" unless="called.from.create-package">
     <antcall inheritall="true">
@@ -66,6 +67,7 @@
       <target name="jar-analyzers-smartcn"/>
       <target name="jar-analyzers-stempel"/>
       <target name="jar-analyzers-morfologik"/>
+      <target name="jar-analyzers-opennlp"/>
     </antcall>
     <property name="analyzers-icu.uptodate" value="true"/> <!-- compile-time dependency -->
     <mkdir dir="${build.dir}/lucene-libs"/>
@@ -85,6 +87,6 @@
     </copy>
   </target>
 
-  <target name="compile-core" depends="jar-analyzers-icu, solr-contrib-build.compile-core"/>
+  <target name="compile-core" depends="jar-analyzers-icu, jar-analyzers-opennlp, solr-contrib-build.compile-core"/>
   <target name="dist" depends="module-jars-to-solr, common-solr.dist"/>
 </project>

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/solr/contrib/analysis-extras/ivy.xml
----------------------------------------------------------------------
diff --git a/solr/contrib/analysis-extras/ivy.xml b/solr/contrib/analysis-extras/ivy.xml
index 0c71701..cfc30c1 100644
--- a/solr/contrib/analysis-extras/ivy.xml
+++ b/solr/contrib/analysis-extras/ivy.xml
@@ -24,6 +24,9 @@
   </configurations>
   <dependencies>
     <dependency org="com.ibm.icu" name="icu4j" rev="${/com.ibm.icu/icu4j}" conf="compile"/>
+    <dependency org="org.apache.opennlp" name="opennlp-tools" rev="${/org.apache.opennlp/opennlp-tools}" conf="compile" />
+    <dependency org="org.apache.opennlp" name="opennlp-maxent" rev="${/org.apache.opennlp/opennlp-maxent}" conf="compile" />
+
     <!--
       Although the 3rd party morfologik jars are not dependencies of code in
       the analysis-extras contrib, they must remain here in order to

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/solr/contrib/analysis-extras/src/java/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.java
----------------------------------------------------------------------
diff --git a/solr/contrib/analysis-extras/src/java/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.java b/solr/contrib/analysis-extras/src/java/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.java
new file mode 100644
index 0000000..d00df2b
--- /dev/null
+++ b/solr/contrib/analysis-extras/src/java/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.java
@@ -0,0 +1,571 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.update.processor;
+
+import static org.apache.solr.common.SolrException.ErrorCode.SERVER_ERROR;
+
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;
+
+import opennlp.tools.util.Span;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.opennlp.OpenNLPTokenizer;
+import org.apache.lucene.analysis.opennlp.tools.NLPNERTaggerOp;
+import org.apache.lucene.analysis.opennlp.tools.OpenNLPOpsFactory;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.FlagsAttribute;
+import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.SolrInputField;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.common.util.Pair;
+import org.apache.solr.core.SolrCore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.schema.FieldType;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.processor.FieldMutatingUpdateProcessor.FieldNameSelector;
+import org.apache.solr.update.processor.FieldMutatingUpdateProcessorFactory.SelectorParams;
+import org.apache.solr.util.plugin.SolrCoreAware;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Extracts named entities using an OpenNLP NER <code>modelFile</code> from the values found in
+ * any matching <code>source</code> field into a configured <code>dest</code> field, after
+ * first tokenizing the source text using the index analyzer on the configured
+ * <code>analyzerFieldType</code>, which must include <code>solr.OpenNLPTokenizerFactory</code>
+ * as the tokenizer. E.g.:
+ *
+ * <pre class="prettyprint">
+ *   &lt;fieldType name="opennlp-en-tokenization" class="solr.TextField"&gt;
+ *     &lt;analyzer&gt;
+ *       &lt;tokenizer class="solr.OpenNLPTokenizerFactory"
+ *                  sentenceModel="en-sent.bin"
+ *                  tokenizerModel="en-tokenizer.bin"/&gt;
+ *     &lt;/analyzer&gt;
+ *   &lt;/fieldType&gt;
+ * </pre>
+ * 
+ * <p>See the <a href="OpenNLP website">http://opennlp.apache.org/models.html</a>
+ * for information on downloading pre-trained models.</p>
+ *
+ * <p>
+ * The <code>source</code> field(s) can be configured as either:
+ * </p>
+ * <ul>
+ *  <li>One or more <code>&lt;str&gt;</code></li>
+ *  <li>An <code>&lt;arr&gt;</code> of <code>&lt;str&gt;</code></li>
+ *  <li>A <code>&lt;lst&gt;</code> containing
+ *   {@link FieldMutatingUpdateProcessor FieldMutatingUpdateProcessorFactory style selector arguments}</li>
+ * </ul>
+ *
+ * <p>The <code>dest</code> field can be a single <code>&lt;str&gt;</code>
+ * containing the literal name of a destination field, or it may be a <code>&lt;lst&gt;</code> specifying a
+ * regex <code>pattern</code> and a <code>replacement</code> string. If the pattern + replacement option
+ * is used the pattern will be matched against all fields matched by the source selector, and the replacement
+ * string (including any capture groups specified from the pattern) will be evaluated a using
+ * {@link Matcher#replaceAll(String)} to generate the literal name of the destination field.  Additionally,
+ * an occurrence of the string "{EntityType}" in the <code>dest</code> field specification, or in the
+ * <code>replacement</code> string, will be replaced with the entity type(s) returned for each entity by
+ * the OpenNLP NER model; as a result, if the model extracts more than one entity type, then more than one
+ * <code>dest</code> field will be populated.
+ * </p>
+ *
+ * <p>If the resolved <code>dest</code> field already exists in the document, then the
+ * named entities extracted from the <code>source</code> fields will be added to it.
+ * </p>
+ * <p>
+ * In the example below:
+ * </p>
+ * <ul>
+ *   <li>Named entities will be extracted from the <code>text</code> field and added
+ *       to the <code>names_ss</code> field</li>
+ *   <li>Named entities will be extracted from both the <code>title</code> and
+ *       <code>subtitle</code> fields and added into the <code>titular_people</code> field</li>
+ *   <li>Named entities will be extracted from any field with a name ending in <code>_txt</code>
+ *       -- except for <code>notes_txt</code> -- and added into the <code>people_ss</code> field</li>
+ *   <li>Named entities will be extracted from any field with a name beginning with "desc" and
+ *       ending in "s" (e.g. "descs" and "descriptions") and added to a field prefixed with "key_",
+ *       not ending in "s", and suffixed with "_people". (e.g. "key_desc_people" or
+ *       "key_description_people")</li>
+ *   <li>Named entities will be extracted from the <code>summary</code> field and added
+ *       to the <code>summary_person_ss</code> field, assuming that the modelFile only extracts
+ *       entities of type "person".</li>
+ * </ul>
+ *
+ * <pre class="prettyprint">
+ * &lt;updateRequestProcessorChain name="multiple-extract"&gt;
+ *   &lt;processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory"&gt;
+ *     &lt;str name="modelFile"&gt;en-test-ner-person.bin&lt;/str&gt;
+ *     &lt;str name="analyzerFieldType"&gt;opennlp-en-tokenization&lt;/str&gt;
+ *     &lt;str name="source"&gt;text&lt;/str&gt;
+ *     &lt;str name="dest"&gt;people_s&lt;/str&gt;
+ *   &lt;/processor&gt;
+ *   &lt;processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory"&gt;
+ *     &lt;str name="modelFile"&gt;en-test-ner-person.bin&lt;/str&gt;
+ *     &lt;str name="analyzerFieldType"&gt;opennlp-en-tokenization&lt;/str&gt;
+ *     &lt;arr name="source"&gt;
+ *       &lt;str&gt;title&lt;/str&gt;
+ *       &lt;str&gt;subtitle&lt;/str&gt;
+ *     &lt;/arr&gt;
+ *     &lt;str name="dest"&gt;titular_people&lt;/str&gt;
+ *   &lt;/processor&gt;
+ *   &lt;processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory"&gt;
+ *     &lt;str name="modelFile"&gt;en-test-ner-person.bin&lt;/str&gt;
+ *     &lt;str name="analyzerFieldType"&gt;opennlp-en-tokenization&lt;/str&gt;
+ *     &lt;lst name="source"&gt;
+ *       &lt;str name="fieldRegex"&gt;.*_txt$&lt;/str&gt;
+ *       &lt;lst name="exclude"&gt;
+ *         &lt;str name="fieldName"&gt;notes_txt&lt;/str&gt;
+ *       &lt;/lst&gt;
+ *     &lt;/lst&gt;
+ *     &lt;str name="dest"&gt;people_s&lt;/str&gt;
+ *   &lt;/processor&gt;
+ *   &lt;processor class="solr.processor.OpenNLPExtractNamedEntitiesUpdateProcessorFactory"&gt;
+ *     &lt;str name="modelFile"&gt;en-test-ner-person.bin&lt;/str&gt;
+ *     &lt;str name="analyzerFieldType"&gt;opennlp-en-tokenization&lt;/str&gt;
+ *     &lt;lst name="source"&gt;
+ *       &lt;str name="fieldRegex"&gt;^desc(.*)s$&lt;/str&gt;
+ *     &lt;/lst&gt;
+ *     &lt;lst name="dest"&gt;
+ *       &lt;str name="pattern"&gt;^desc(.*)s$&lt;/str&gt;
+ *       &lt;str name="replacement"&gt;key_desc$1_people&lt;/str&gt;
+ *     &lt;/lst&gt;
+ *   &lt;/processor&gt;
+ *   &lt;processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory"&gt;
+ *     &lt;str name="modelFile"&gt;en-test-ner-person.bin&lt;/str&gt;
+ *     &lt;str name="analyzerFieldType"&gt;opennlp-en-tokenization&lt;/str&gt;
+ *     &lt;str name="source"&gt;summary&lt;/str&gt;
+ *     &lt;str name="dest"&gt;summary_{EntityType}_s&lt;/str&gt;
+ *   &lt;/processor&gt;
+ * &lt;/updateRequestProcessorChain&gt;
+ * </pre>
+ *
+ * @since 7.3.0
+ */
+public class OpenNLPExtractNamedEntitiesUpdateProcessorFactory
+    extends UpdateRequestProcessorFactory implements SolrCoreAware {
+
+  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  public static final String SOURCE_PARAM = "source";
+  public static final String DEST_PARAM = "dest";
+  public static final String PATTERN_PARAM = "pattern";
+  public static final String REPLACEMENT_PARAM = "replacement";
+  public static final String MODEL_PARAM = "modelFile";
+  public static final String ANALYZER_FIELD_TYPE_PARAM = "analyzerFieldType";
+  public static final String ENTITY_TYPE = "{EntityType}";
+
+  private SelectorParams srcInclusions = new SelectorParams();
+  private Collection<SelectorParams> srcExclusions = new ArrayList<>();
+
+  private FieldNameSelector srcSelector = null;
+
+  private String modelFile = null;
+  private String analyzerFieldType = null;
+
+  /**
+   * If pattern is null, this this is a literal field name.  If pattern is non-null then this
+   * is a replacement string that may contain meta-characters (ie: capture group identifiers)
+   * @see #pattern
+   */
+  private String dest = null;
+  /** @see #dest */
+  private Pattern pattern = null;
+
+  protected final FieldNameSelector getSourceSelector() {
+    if (null != srcSelector) return srcSelector;
+
+    throw new SolrException(SERVER_ERROR, "selector was never initialized, inform(SolrCore) never called???");
+  }
+
+  @SuppressWarnings("unchecked")
+  @Override
+  public void init(NamedList args) {
+
+    // high level (loose) check for which type of config we have.
+    //
+    // individual init methods do more strict syntax checking
+    if (0 <= args.indexOf(SOURCE_PARAM, 0) && 0 <= args.indexOf(DEST_PARAM, 0) ) {
+      initSourceSelectorSyntax(args);
+    } else if (0 <= args.indexOf(PATTERN_PARAM, 0) && 0 <= args.indexOf(REPLACEMENT_PARAM, 0)) {
+      initSimpleRegexReplacement(args);
+    } else {
+      throw new SolrException(SERVER_ERROR, "A combination of either '" + SOURCE_PARAM + "' + '"+
+          DEST_PARAM + "', or '" + REPLACEMENT_PARAM + "' + '" +
+          PATTERN_PARAM + "' init params are mandatory");
+    }
+
+    Object modelParam = args.remove(MODEL_PARAM);
+    if (null == modelParam) {
+      throw new SolrException(SERVER_ERROR, "Missing required init param '" + MODEL_PARAM + "'");
+    }
+    if ( ! (modelParam instanceof CharSequence)) {
+      throw new SolrException(SERVER_ERROR, "Init param '" + MODEL_PARAM + "' must be a <str>");
+    }
+    modelFile = modelParam.toString();
+
+    Object analyzerFieldTypeParam = args.remove(ANALYZER_FIELD_TYPE_PARAM);
+    if (null == analyzerFieldTypeParam) {
+      throw new SolrException(SERVER_ERROR, "Missing required init param '" + ANALYZER_FIELD_TYPE_PARAM + "'");
+    }
+    if ( ! (analyzerFieldTypeParam instanceof CharSequence)) {
+      throw new SolrException(SERVER_ERROR, "Init param '" + ANALYZER_FIELD_TYPE_PARAM + "' must be a <str>");
+    }
+    analyzerFieldType = analyzerFieldTypeParam.toString();
+
+    if (0 < args.size()) {
+      throw new SolrException(SERVER_ERROR, "Unexpected init param(s): '" + args.getName(0) + "'");
+    }
+
+    super.init(args);
+  }
+
+  /**
+   * init helper method that should only be called when we know for certain that both the
+   * "source" and "dest" init params do <em>not</em> exist.
+   */
+  @SuppressWarnings("unchecked")
+  private void initSimpleRegexReplacement(NamedList args) {
+    // The syntactic sugar for the case where there is only one regex pattern for source and the same pattern
+    // is used for the destination pattern...
+    //
+    //  pattern != null && replacement != null
+    //
+    // ...as top level elements, with no other config options specified
+
+    // if we got here we know we had pattern and replacement, now check for the other two  so that we can give a better
+    // message than "unexpected"
+    if (0 <= args.indexOf(SOURCE_PARAM, 0) || 0 <= args.indexOf(DEST_PARAM, 0) ) {
+      throw new SolrException(SERVER_ERROR,"Short hand syntax must not be mixed with full syntax. Found " +
+          PATTERN_PARAM + " and " + REPLACEMENT_PARAM + " but also found " + SOURCE_PARAM + " or " + DEST_PARAM);
+    }
+
+    assert args.indexOf(SOURCE_PARAM, 0) < 0;
+
+    Object patt = args.remove(PATTERN_PARAM);
+    Object replacement = args.remove(REPLACEMENT_PARAM);
+
+    if (null == patt || null == replacement) {
+      throw new SolrException(SERVER_ERROR, "Init params '" + PATTERN_PARAM + "' and '" +
+          REPLACEMENT_PARAM + "' are both mandatory if '" + SOURCE_PARAM + "' and '"+
+          DEST_PARAM + "' are not both specified");
+    }
+
+    if (0 != args.size()) {
+      throw new SolrException(SERVER_ERROR, "Init params '" + REPLACEMENT_PARAM + "' and '" +
+          PATTERN_PARAM + "' must be children of '" + DEST_PARAM +
+          "' to be combined with other options.");
+    }
+
+    if (!(replacement instanceof String)) {
+      throw new SolrException(SERVER_ERROR, "Init param '" + REPLACEMENT_PARAM + "' must be a string (i.e. <str>)");
+    }
+    if (!(patt instanceof String)) {
+      throw new SolrException(SERVER_ERROR, "Init param '" + PATTERN_PARAM + "' must be a string (i.e. <str>)");
+    }
+
+    dest = replacement.toString();
+    try {
+      this.pattern = Pattern.compile(patt.toString());
+    } catch (PatternSyntaxException pe) {
+      throw new SolrException(SERVER_ERROR, "Init param " + PATTERN_PARAM +
+          " is not a valid regex pattern: " + patt, pe);
+
+    }
+    srcInclusions = new SelectorParams();
+    srcInclusions.fieldRegex = Collections.singletonList(this.pattern);
+  }
+
+  /**
+   * init helper method that should only be called when we know for certain that both the
+   * "source" and "dest" init params <em>do</em> exist.
+   */
+  @SuppressWarnings("unchecked")
+  private void initSourceSelectorSyntax(NamedList args) {
+    // Full and complete syntax where source and dest are mandatory.
+    //
+    // source may be a single string or a selector.
+    // dest may be a single string or list containing pattern and replacement
+    //
+    //   source != null && dest != null
+
+    // if we got here we know we had source and dest, now check for the other two so that we can give a better
+    // message than "unexpected"
+    if (0 <= args.indexOf(PATTERN_PARAM, 0) || 0 <= args.indexOf(REPLACEMENT_PARAM, 0) ) {
+      throw new SolrException(SERVER_ERROR,"Short hand syntax must not be mixed with full syntax. Found " +
+          SOURCE_PARAM + " and " + DEST_PARAM + " but also found " + PATTERN_PARAM + " or " + REPLACEMENT_PARAM);
+    }
+
+    Object d = args.remove(DEST_PARAM);
+    assert null != d;
+
+    List<Object> sources = args.getAll(SOURCE_PARAM);
+    assert null != sources;
+
+    if (1 == sources.size()) {
+      if (sources.get(0) instanceof NamedList) {
+        // nested set of selector options
+        NamedList selectorConfig = (NamedList) args.remove(SOURCE_PARAM);
+
+        srcInclusions = parseSelectorParams(selectorConfig);
+
+        List<Object> excList = selectorConfig.getAll("exclude");
+
+        for (Object excObj : excList) {
+          if (null == excObj) {
+            throw new SolrException(SERVER_ERROR, "Init param '" + SOURCE_PARAM +
+                "' child 'exclude' can not be null");
+          }
+          if (!(excObj instanceof NamedList)) {
+            throw new SolrException(SERVER_ERROR, "Init param '" + SOURCE_PARAM +
+                "' child 'exclude' must be <lst/>");
+          }
+          NamedList exc = (NamedList) excObj;
+          srcExclusions.add(parseSelectorParams(exc));
+          if (0 < exc.size()) {
+            throw new SolrException(SERVER_ERROR, "Init param '" + SOURCE_PARAM +
+                "' has unexpected 'exclude' sub-param(s): '"
+                + selectorConfig.getName(0) + "'");
+          }
+          // call once per instance
+          selectorConfig.remove("exclude");
+        }
+
+        if (0 < selectorConfig.size()) {
+          throw new SolrException(SERVER_ERROR, "Init param '" + SOURCE_PARAM +
+              "' contains unexpected child param(s): '" +
+              selectorConfig.getName(0) + "'");
+        }
+        // consume from the named list so it doesn't interfere with subsequent processing
+        sources.remove(0);
+      }
+    }
+    if (1 <= sources.size()) {
+      // source better be one or more strings
+      srcInclusions.fieldName = new HashSet<>(args.removeConfigArgs("source"));
+    }
+    if (srcInclusions == null) {
+      throw new SolrException(SERVER_ERROR,
+          "Init params do not specify any field from which to extract entities, please supply either "
+          + SOURCE_PARAM + " and " + DEST_PARAM + " or " + PATTERN_PARAM + " and " + REPLACEMENT_PARAM + ". See javadocs" +
+          "for OpenNLPExtractNamedEntitiesUpdateProcessor for further details.");
+    }
+
+    if (d instanceof NamedList) {
+      NamedList destList = (NamedList) d;
+
+      Object patt = destList.remove(PATTERN_PARAM);
+      Object replacement = destList.remove(REPLACEMENT_PARAM);
+
+      if (null == patt || null == replacement) {
+        throw new SolrException(SERVER_ERROR, "Init param '" + DEST_PARAM + "' children '" +
+            PATTERN_PARAM + "' and '" + REPLACEMENT_PARAM +
+            "' are both mandatory and can not be null");
+      }
+      if (! (patt instanceof String && replacement instanceof String)) {
+        throw new SolrException(SERVER_ERROR, "Init param '" + DEST_PARAM + "' children '" +
+            PATTERN_PARAM + "' and '" + REPLACEMENT_PARAM +
+            "' must both be strings (i.e. <str>)");
+      }
+      if (0 != destList.size()) {
+        throw new SolrException(SERVER_ERROR, "Init param '" + DEST_PARAM + "' has unexpected children: '"
+            + destList.getName(0) + "'");
+      }
+
+      try {
+        this.pattern = Pattern.compile(patt.toString());
+      } catch (PatternSyntaxException pe) {
+        throw new SolrException(SERVER_ERROR, "Init param '" + DEST_PARAM + "' child '" + PATTERN_PARAM +
+            " is not a valid regex pattern: " + patt, pe);
+      }
+      dest = replacement.toString();
+
+    } else if (d instanceof String) {
+      dest = d.toString();
+    } else {
+      throw new SolrException(SERVER_ERROR, "Init param '" + DEST_PARAM + "' must either be a string " +
+          "(i.e. <str>) or a list (i.e. <lst>) containing '" +
+          PATTERN_PARAM + "' and '" + REPLACEMENT_PARAM);
+    }
+
+  }
+
+  @Override
+  public void inform(final SolrCore core) {
+
+    srcSelector =
+        FieldMutatingUpdateProcessor.createFieldNameSelector
+            (core.getResourceLoader(), core, srcInclusions, FieldMutatingUpdateProcessor.SELECT_NO_FIELDS);
+
+    for (SelectorParams exc : srcExclusions) {
+      srcSelector = FieldMutatingUpdateProcessor.wrap
+          (srcSelector,
+              FieldMutatingUpdateProcessor.createFieldNameSelector
+                  (core.getResourceLoader(), core, exc, FieldMutatingUpdateProcessor.SELECT_NO_FIELDS));
+    }
+    try {
+      OpenNLPOpsFactory.getNERTaggerModel(modelFile, core.getResourceLoader());
+    } catch (IOException e) {
+      throw new IllegalArgumentException(e);
+    }
+  }
+
+  @Override
+  public final UpdateRequestProcessor getInstance
+      (SolrQueryRequest req, SolrQueryResponse rsp, UpdateRequestProcessor next) {
+    final FieldNameSelector srcSelector = getSourceSelector();
+    return new UpdateRequestProcessor(next) {
+      private final NLPNERTaggerOp nerTaggerOp;
+      private Analyzer analyzer = null;
+      {
+        try {
+          nerTaggerOp = OpenNLPOpsFactory.getNERTagger(modelFile);
+          FieldType fieldType = req.getSchema().getFieldTypeByName(analyzerFieldType);
+          if (fieldType == null) {
+            throw new SolrException
+                (SERVER_ERROR, ANALYZER_FIELD_TYPE_PARAM + " '" + analyzerFieldType + "' not found in the schema.");
+          }
+          analyzer = fieldType.getIndexAnalyzer();
+        } catch (IOException e) {
+          throw new IllegalArgumentException(e);
+        }
+      }
+
+      @Override
+      public void processAdd(AddUpdateCommand cmd) throws IOException {
+
+        final SolrInputDocument doc = cmd.getSolrInputDocument();
+
+        // Destination may be regex replace string, or "{EntityType}" replaced by
+        // each entity's type, both of which can cause multiple output fields.
+        Map<String,SolrInputField> destMap = new HashMap<>();
+
+        // preserve initial values
+        for (final String fname : doc.getFieldNames()) {
+          if ( ! srcSelector.shouldMutate(fname)) continue;
+
+          Collection<Object> srcFieldValues = doc.getFieldValues(fname);
+          if (srcFieldValues == null || srcFieldValues.isEmpty()) continue;
+
+          String resolvedDest = dest;
+
+          if (pattern != null) {
+            Matcher matcher = pattern.matcher(fname);
+            if (matcher.find()) {
+              resolvedDest = matcher.replaceAll(dest);
+            } else {
+              log.debug("srcSelector.shouldMutate(\"{}\") returned true, " +
+                  "but replacement pattern did not match, field skipped.", fname);
+              continue;
+            }
+          }
+
+          for (Object val : srcFieldValues) {
+            for (Pair<String,String> entity : extractTypedNamedEntities(val)) {
+              SolrInputField destField = null;
+              String entityName = entity.first();
+              String entityType = entity.second();
+              resolvedDest = resolvedDest.replace(ENTITY_TYPE, entityType);
+              if (doc.containsKey(resolvedDest)) {
+                destField = doc.getField(resolvedDest);
+              } else {
+                SolrInputField targetField = destMap.get(resolvedDest);
+                if (targetField == null) {
+                  destField = new SolrInputField(resolvedDest);
+                } else {
+                  destField = targetField;
+                }
+              }
+              destField.addValue(entityName);
+
+              // put it in map to avoid concurrent modification...
+              destMap.put(resolvedDest, destField);
+            }
+          }
+        }
+
+        for (Map.Entry<String,SolrInputField> entry : destMap.entrySet()) {
+          doc.put(entry.getKey(), entry.getValue());
+        }
+        super.processAdd(cmd);
+      }
+
+      /** Using configured NER model, extracts (name, type) pairs from the given source field value */
+      private List<Pair<String,String>> extractTypedNamedEntities(Object srcFieldValue) throws IOException {
+        List<Pair<String,String>> entitiesWithType = new ArrayList<>();
+        List<String> terms = new ArrayList<>();
+        List<Integer> startOffsets = new ArrayList<>();
+        List<Integer> endOffsets = new ArrayList<>();
+        String fullText = srcFieldValue.toString();
+        TokenStream tokenStream = analyzer.tokenStream("", fullText);
+        CharTermAttribute termAtt = tokenStream.addAttribute(CharTermAttribute.class);
+        OffsetAttribute offsetAtt = tokenStream.addAttribute(OffsetAttribute.class);
+        FlagsAttribute flagsAtt = tokenStream.addAttribute(FlagsAttribute.class);
+        tokenStream.reset();
+        synchronized (nerTaggerOp) {
+          while (tokenStream.incrementToken()) {
+            terms.add(termAtt.toString());
+            startOffsets.add(offsetAtt.startOffset());
+            endOffsets.add(offsetAtt.endOffset());
+            boolean endOfSentence = 0 != (flagsAtt.getFlags() & OpenNLPTokenizer.EOS_FLAG_BIT);
+            if (endOfSentence) {    // extract named entities one sentence at a time
+              extractEntitiesFromSentence(fullText, terms, startOffsets, endOffsets, entitiesWithType);
+            }
+          }
+          tokenStream.end();
+          tokenStream.close();
+          if (!terms.isEmpty()) { // In case last token of last sentence isn't properly flagged with EOS_FLAG_BIT
+            extractEntitiesFromSentence(fullText, terms, startOffsets, endOffsets, entitiesWithType);
+          }
+          nerTaggerOp.reset();      // Forget all adaptive data collected during previous calls
+        }
+        return entitiesWithType;
+      }
+
+      private void extractEntitiesFromSentence(String fullText, List<String> terms, List<Integer> startOffsets,
+                                               List<Integer> endOffsets, List<Pair<String,String>> entitiesWithType) {
+        for (Span span : nerTaggerOp.getNames(terms.toArray(new String[terms.size()]))) {
+          String text = fullText.substring(startOffsets.get(span.getStart()), endOffsets.get(span.getEnd() - 1));
+          entitiesWithType.add(new Pair<>(text, span.getType()));
+        }
+        terms.clear();
+        startOffsets.clear();
+        endOffsets.clear();
+      }
+    };
+  }
+
+  /** macro */
+  private static SelectorParams parseSelectorParams(NamedList args) {
+    return FieldMutatingUpdateProcessorFactory.parseSelectorParams(args);
+  }
+}

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/solr/contrib/analysis-extras/src/java/org/apache/solr/update/processor/package.html
----------------------------------------------------------------------
diff --git a/solr/contrib/analysis-extras/src/java/org/apache/solr/update/processor/package.html b/solr/contrib/analysis-extras/src/java/org/apache/solr/update/processor/package.html
new file mode 100644
index 0000000..1388c29
--- /dev/null
+++ b/solr/contrib/analysis-extras/src/java/org/apache/solr/update/processor/package.html
@@ -0,0 +1,24 @@
+<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<!-- not a package-info.java, because we already defined this package in core/ -->
+<html>
+  <body>
+    Update request processor invoking OpenNLP Named Entity Recognition over configured
+    source field(s), populating configured target field(s) with the results.
+  </body>
+</html>

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/en-test-ner-person.bin
----------------------------------------------------------------------
diff --git a/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/en-test-ner-person.bin b/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/en-test-ner-person.bin
new file mode 100644
index 0000000..0b40aac
Binary files /dev/null and b/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/en-test-ner-person.bin differ

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/en-test-sent.bin
----------------------------------------------------------------------
diff --git a/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/en-test-sent.bin b/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/en-test-sent.bin
new file mode 100644
index 0000000..4252bcb
Binary files /dev/null and b/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/en-test-sent.bin differ

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/en-test-tokenizer.bin
----------------------------------------------------------------------
diff --git a/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/en-test-tokenizer.bin b/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/en-test-tokenizer.bin
new file mode 100644
index 0000000..94668c0
Binary files /dev/null and b/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/en-test-tokenizer.bin differ

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/schema-opennlp-extract.xml
----------------------------------------------------------------------
diff --git a/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/schema-opennlp-extract.xml b/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/schema-opennlp-extract.xml
new file mode 100644
index 0000000..fc13431
--- /dev/null
+++ b/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/schema-opennlp-extract.xml
@@ -0,0 +1,49 @@
+<?xml version="1.0" ?>
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<schema name="test-opennlp-extract" version="1.6">
+  <fieldType name="opennlp-en-tokenization" class="solr.TextField">
+    <analyzer>
+      <tokenizer class="solr.OpenNLPTokenizerFactory"
+                 sentenceModel="en-test-sent.bin"
+                 tokenizerModel="en-test-tokenizer.bin"/>
+    </analyzer>
+  </fieldType>
+
+  <fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
+
+  <fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
+    <analyzer>
+      <tokenizer class="solr.MockTokenizerFactory"/>
+      <filter class="solr.LowerCaseFilterFactory"/>
+      <filter class="solr.PorterStemFilterFactory"/>
+    </analyzer>
+  </fieldType>
+
+  <field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
+  <field name="text" type="text" indexed="true" stored="false"/>
+  <field name="subject" type="text" indexed="true" stored="true"/>
+  <field name="title" type="text" indexed="true" stored="true"/>
+  <field name="subtitle" type="text" indexed="true" stored="true"/>
+  <field name="descs" type="text" indexed="true" stored="true"/>
+  <field name="descriptions" type="text" indexed="true" stored="true"/>
+
+  <dynamicField name="*_txt" type="text" indexed="true" stored="true"/>
+  <dynamicField name="*_s" type="string" indexed="true" stored="true" multiValued="true"/>
+  <dynamicField name="*_people" type="string" indexed="true" stored="true" multiValued="true"/>
+</schema>

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/solrconfig-opennlp-extract.xml
----------------------------------------------------------------------
diff --git a/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/solrconfig-opennlp-extract.xml b/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/solrconfig-opennlp-extract.xml
new file mode 100644
index 0000000..c44c9e1
--- /dev/null
+++ b/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/solrconfig-opennlp-extract.xml
@@ -0,0 +1,206 @@
+<?xml version="1.0" ?>
+
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<config>
+  <luceneMatchVersion>${tests.luceneMatchVersion:LATEST}</luceneMatchVersion>
+  <xi:include href="solrconfig.snippet.randomindexconfig.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
+  <requestHandler name="/select" class="solr.SearchHandler"></requestHandler>
+  <requestHandler name="/update" class="solr.UpdateRequestHandler"  />
+  <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.RAMDirectoryFactory}"/>
+  <schemaFactory class="ClassicIndexSchemaFactory"/>
+
+  <updateRequestProcessorChain name="extract-single">
+    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
+      <str name="modelFile">en-test-ner-person.bin</str>
+      <str name="analyzerFieldType">opennlp-en-tokenization</str>
+      <str name="source">source1_s</str>
+      <str name="dest">dest_s</str>
+    </processor>
+  </updateRequestProcessorChain>
+
+  <updateRequestProcessorChain name="extract-single-regex">
+    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
+      <str name="modelFile">en-test-ner-person.bin</str>
+      <str name="analyzerFieldType">opennlp-en-tokenization</str>
+      <str name="source">source1_s</str>
+      <lst name="dest">
+        <str name="pattern">source\d(_s)</str>
+        <str name="replacement">dest$1</str>
+      </lst>
+    </processor>
+  </updateRequestProcessorChain>
+
+  <updateRequestProcessorChain name="extract-multi">
+    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
+      <str name="modelFile">en-test-ner-person.bin</str>
+      <str name="analyzerFieldType">opennlp-en-tokenization</str>
+      <str name="source">source1_s</str>
+      <str name="source">source2_s</str>
+      <str name="dest">dest_s</str>
+    </processor>
+  </updateRequestProcessorChain>
+
+  <updateRequestProcessorChain name="extract-multi-regex">
+    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
+      <str name="modelFile">en-test-ner-person.bin</str>
+      <str name="analyzerFieldType">opennlp-en-tokenization</str>
+      <str name="source">source1_s</str>
+      <str name="source">source2_s</str>
+      <lst name="dest">
+        <str name="pattern">source\d(_s)</str>
+        <str name="replacement">dest$1</str>
+      </lst>
+    </processor>
+  </updateRequestProcessorChain>
+
+  <updateRequestProcessorChain name="extract-array">
+    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
+      <str name="modelFile">en-test-ner-person.bin</str>
+      <str name="analyzerFieldType">opennlp-en-tokenization</str>
+      <arr name="source">
+        <str>source1_s</str>
+        <str>source2_s</str>
+      </arr>
+      <str name="dest">dest_s</str>
+    </processor>
+  </updateRequestProcessorChain>
+
+  <updateRequestProcessorChain name="extract-array-regex">
+    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
+      <str name="modelFile">en-test-ner-person.bin</str>
+      <str name="analyzerFieldType">opennlp-en-tokenization</str>
+      <arr name="source">
+        <str>source1_s</str>
+        <str>source2_s</str>
+      </arr>
+      <lst name="dest">
+        <str name="pattern">source\d(_s)</str>
+        <str name="replacement">dest$1</str>
+      </lst>
+    </processor>
+  </updateRequestProcessorChain>
+
+  <updateRequestProcessorChain name="extract-selector">
+    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
+      <str name="modelFile">en-test-ner-person.bin</str>
+      <str name="analyzerFieldType">opennlp-en-tokenization</str>
+      <lst name="source">
+        <str name="fieldRegex">source\d_.*</str>
+        <lst name="exclude">
+          <str name="fieldRegex">source0_.*</str>
+        </lst>
+      </lst>
+      <str name="dest">dest_s</str>
+    </processor>
+  </updateRequestProcessorChain>
+
+  <updateRequestProcessorChain name="extract-selector-regex">
+    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
+      <str name="modelFile">en-test-ner-person.bin</str>
+      <str name="analyzerFieldType">opennlp-en-tokenization</str>
+      <lst name="source">
+        <str name="fieldRegex">source\d_.*</str>
+        <lst name="exclude">
+          <str name="fieldRegex">source0_.*</str>
+        </lst>
+      </lst>
+      <lst name="dest">
+        <str name="pattern">source\d(_s)</str>
+        <str name="replacement">dest$1</str>
+      </lst>
+    </processor>
+  </updateRequestProcessorChain>
+
+  <updateRequestProcessorChain name="extract-regex-replaceall">
+    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
+      <str name="modelFile">en-test-ner-person.bin</str>
+      <str name="analyzerFieldType">opennlp-en-tokenization</str>
+      <lst name="source">
+        <str name="fieldRegex">foo.*</str>
+      </lst>
+      <lst name="dest">
+        <!-- unbounded pattern that can be replaced multiple times in field name -->
+        <str name="pattern">x(\d)</str>
+        <str name="replacement">y$1</str>
+      </lst>
+    </processor>
+  </updateRequestProcessorChain>
+
+  <updateRequestProcessorChain name="extract-regex-replaceall-with-entity-type">
+    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
+      <str name="modelFile">en-test-ner-person.bin</str>
+      <str name="analyzerFieldType">opennlp-en-tokenization</str>
+      <lst name="source">
+        <str name="fieldRegex">foo.*</str>
+      </lst>
+      <lst name="dest">
+        <!-- unbounded pattern that can be replaced multiple times in field name -->
+        <str name="pattern">x(\d)</str>
+        <str name="replacement">{EntityType}_y$1</str>
+      </lst>
+    </processor>
+  </updateRequestProcessorChain>
+
+  <!-- example used in OpenNLPExtractNamedEntitiesUpdateProcessorFactory javadocs -->
+  <updateRequestProcessorChain name="multiple-extract">
+    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
+      <str name="modelFile">en-test-ner-person.bin</str>
+      <str name="analyzerFieldType">opennlp-en-tokenization</str>
+      <str name="source">text</str>
+      <str name="dest">people_s</str>
+    </processor>
+    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
+      <str name="modelFile">en-test-ner-person.bin</str>
+      <str name="analyzerFieldType">opennlp-en-tokenization</str>
+      <arr name="source">
+        <str>title</str>
+        <str>subtitle</str>
+      </arr>
+      <str name="dest">titular_people</str>
+    </processor>
+    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
+      <str name="modelFile">en-test-ner-person.bin</str>
+      <str name="analyzerFieldType">opennlp-en-tokenization</str>
+      <lst name="source">
+        <str name="fieldRegex">.*_txt$</str>
+        <lst name="exclude">
+          <str name="fieldName">notes_txt</str>
+        </lst>
+      </lst>
+      <str name="dest">people_s</str>
+    </processor>
+    <processor class="solr.processor.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
+      <str name="modelFile">en-test-ner-person.bin</str>
+      <str name="analyzerFieldType">opennlp-en-tokenization</str>
+      <lst name="source">
+        <str name="fieldRegex">^desc(.*)s$</str>
+      </lst>
+      <lst name="dest">
+        <str name="pattern">^desc(.*)s$</str>
+        <str name="replacement">key_desc$1_people</str>
+      </lst>
+    </processor>
+    <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
+      <str name="modelFile">en-test-ner-person.bin</str>
+      <str name="analyzerFieldType">opennlp-en-tokenization</str>
+      <str name="source">summary</str>
+      <str name="dest">summary_{EntityType}_s</str>
+    </processor>
+  </updateRequestProcessorChain>
+</config>

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/solrconfig.snippet.randomindexconfig.xml
----------------------------------------------------------------------
diff --git a/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/solrconfig.snippet.randomindexconfig.xml b/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/solrconfig.snippet.randomindexconfig.xml
new file mode 100644
index 0000000..23516b0
--- /dev/null
+++ b/solr/contrib/analysis-extras/src/test-files/analysis-extras/solr/collection1/conf/solrconfig.snippet.randomindexconfig.xml
@@ -0,0 +1,48 @@
+<?xml version="1.0" ?>
+
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<!--
+A solrconfig.xml snippet containing indexConfig settings for randomized testing.
+-->
+<indexConfig>
+  <!-- this sys property is not set by SolrTestCaseJ4 because we ideally want to use
+       the RandomMergePolicy in all tests - but some tests expect very specific
+       Merge behavior, so those tests can set it as needed.
+  -->
+  <mergePolicyFactory class="${solr.tests.mergePolicyFactory:org.apache.solr.util.RandomMergePolicyFactory}" />
+
+  <useCompoundFile>${useCompoundFile:false}</useCompoundFile>
+
+  <maxBufferedDocs>${solr.tests.maxBufferedDocs}</maxBufferedDocs>
+  <ramBufferSizeMB>${solr.tests.ramBufferSizeMB}</ramBufferSizeMB>
+
+  <mergeScheduler class="${solr.tests.mergeScheduler}" />
+
+  <writeLockTimeout>1000</writeLockTimeout>
+  <commitLockTimeout>10000</commitLockTimeout>
+
+  <!-- this sys property is not set by SolrTestCaseJ4 because almost all tests should
+       use the single process lockType for speed - but tests that explicitly need
+       to vary the lockType can set it as needed.
+  -->
+  <lockType>${solr.tests.lockType:single}</lockType>
+
+  <infoStream>${solr.tests.infostream:false}</infoStream>
+
+</indexConfig>

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/solr/contrib/analysis-extras/src/test/org/apache/solr/update/processor/TestOpenNLPExtractNamedEntitiesUpdateProcessorFactory.java
----------------------------------------------------------------------
diff --git a/solr/contrib/analysis-extras/src/test/org/apache/solr/update/processor/TestOpenNLPExtractNamedEntitiesUpdateProcessorFactory.java b/solr/contrib/analysis-extras/src/test/org/apache/solr/update/processor/TestOpenNLPExtractNamedEntitiesUpdateProcessorFactory.java
new file mode 100644
index 0000000..dad06a8
--- /dev/null
+++ b/solr/contrib/analysis-extras/src/test/org/apache/solr/update/processor/TestOpenNLPExtractNamedEntitiesUpdateProcessorFactory.java
@@ -0,0 +1,192 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.update.processor;
+
+import java.io.File;
+import java.util.Arrays;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.solr.common.SolrInputDocument;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+public class TestOpenNLPExtractNamedEntitiesUpdateProcessorFactory extends UpdateProcessorTestBase {
+
+  @BeforeClass
+  public static void beforeClass() throws Exception {
+    File testHome = createTempDir().toFile();
+    FileUtils.copyDirectory(getFile("analysis-extras/solr"), testHome);
+    initCore("solrconfig-opennlp-extract.xml", "schema-opennlp-extract.xml", testHome.getAbsolutePath());
+  }
+
+  @Test
+  public void testSimpleExtract() throws Exception {
+    SolrInputDocument doc = processAdd("extract-single",
+        doc(f("id", "1"),
+            f("source1_s", "Take this to Mr. Flashman.")));
+    assertEquals("dest_s should have stringValue", "Flashman", doc.getFieldValue("dest_s"));
+  }
+
+  @Test
+  public void testMultiExtract() throws Exception {
+    SolrInputDocument doc = processAdd("extract-multi",
+        doc(f("id", "1"),
+            f("source1_s", "Hello Flashman."),
+            f("source2_s", "Calling Flashman.")));
+
+    assertEquals(Arrays.asList("Flashman", "Flashman"), doc.getFieldValues("dest_s"));
+  }
+
+  @Test
+  public void testArrayExtract() throws Exception {
+    SolrInputDocument doc = processAdd("extract-array",
+        doc(f("id", "1"),
+            f("source1_s", "Currently we have Flashman. Not much else."),
+            f("source2_s", "Flashman. Is. Not. There.")));
+
+    assertEquals(Arrays.asList("Flashman", "Flashman"), doc.getFieldValues("dest_s"));
+  }
+
+  @Test
+  public void testSelectorExtract() throws Exception {
+    SolrInputDocument doc = processAdd("extract-selector",
+        doc(f("id", "1"),
+            f("source0_s", "Flashman. Or not."),
+            f("source1_s", "Serendipitously, he was. I mean, Flashman. And yet."),
+            f("source2_s", "Correct, Flashman.")));
+
+    assertEquals(Arrays.asList("Flashman", "Flashman"), doc.getFieldValues("dest_s"));
+  }
+
+  public void testMultipleExtracts() throws Exception {
+    // test example from the javadocs
+    SolrInputDocument doc = processAdd("multiple-extract",
+        doc(f("id", "1"),
+            f("text", "From Flashman. To Panman."),
+            f("title", "It's Captain Flashman.", "Privately, Flashman."),
+            f("subtitle", "Ineluctably, Flashman."),
+            f("corrolary_txt", "Forsooth thou bringeth Flashman."),
+            f("notes_txt", "Yes Flashman."),
+            f("summary", "Many aspire to be Flashman."),
+            f("descs", "Courage, Flashman.", "Ain't he Flashman."),
+            f("descriptions", "Flashman. Flashman. Flashman.")));
+
+    assertEquals(Arrays.asList("Flashman", "Flashman"), doc.getFieldValues("people_s"));
+    assertEquals(Arrays.asList("Flashman", "Flashman", "Flashman"), doc.getFieldValues("titular_people"));
+    assertEquals(Arrays.asList("Flashman", "Flashman"), doc.getFieldValues("key_desc_people"));
+    assertEquals(Arrays.asList("Flashman", "Flashman", "Flashman"), doc.getFieldValues("key_description_people"));
+    assertEquals("Flashman", doc.getFieldValue("summary_person_s")); // {EntityType} field name interpolation
+  }
+
+  public void testEquivalentExtraction() throws Exception {
+    SolrInputDocument d;
+
+    // regardless of chain, all of these checks should be equivalent
+    for (String chain : Arrays.asList("extract-single", "extract-single-regex",
+        "extract-multi", "extract-multi-regex",
+        "extract-array", "extract-array-regex",
+        "extract-selector", "extract-selector-regex")) {
+
+      // simple extract
+      d = processAdd(chain,
+          doc(f("id", "1111"),
+              f("source0_s", "Totally Flashman."), // not extracted
+              f("source1_s", "One nation under Flashman.", "Good Flashman.")));
+      assertNotNull(chain, d);
+      assertEquals(chain, Arrays.asList("Flashman", "Flashman"), d.getFieldValues("dest_s"));
+
+      // append to existing values
+      d = processAdd(chain,
+          doc(f("id", "1111"),
+              field("dest_s", "orig1", "orig2"),
+              f("source0_s", "Flashman. In totality."), // not extracted
+              f("source1_s", "Two nations under Flashman.", "Meh Flashman.")));
+      assertNotNull(chain, d);
+      assertEquals(chain, Arrays.asList("orig1", "orig2", "Flashman", "Flashman"), d.getFieldValues("dest_s"));
+    }
+
+    // should be equivalent for any chain matching source1_s and source2_s (but not source0_s)
+    for (String chain : Arrays.asList("extract-multi", "extract-multi-regex",
+        "extract-array", "extract-array-regex",
+        "extract-selector", "extract-selector-regex")) {
+
+      // simple extract
+      d = processAdd(chain,
+          doc(f("id", "1111"),
+              f("source0_s", "Not Flashman."), // not extracted
+              f("source1_s", "Could have had a Flashman.", "Bad Flashman."),
+              f("source2_s", "Indubitably Flashman.")));
+      assertNotNull(chain, d);
+      assertEquals(chain, Arrays.asList("Flashman", "Flashman", "Flashman"), d.getFieldValues("dest_s"));
+
+      // append to existing values
+      d = processAdd(chain,
+          doc(f("id", "1111"),
+              field("dest_s", "orig1", "orig2"),
+              f("source0_s", "Never Flashman."), // not extracted
+              f("source1_s", "Seeking Flashman.", "Evil incarnate Flashman."),
+              f("source2_s", "Perfunctorily Flashman.")));
+      assertNotNull(chain, d);
+      assertEquals(chain, Arrays.asList("orig1", "orig2", "Flashman", "Flashman", "Flashman"), d.getFieldValues("dest_s"));
+    }
+
+    // any chain that copies source1_s to dest_s should be equivalent for these assertions
+    for (String chain : Arrays.asList("extract-single", "extract-single-regex",
+        "extract-multi", "extract-multi-regex",
+        "extract-array", "extract-array-regex",
+        "extract-selector", "extract-selector-regex")) {
+
+      // simple extract
+      d = processAdd(chain,
+          doc(f("id", "1111"),
+              f("source1_s", "Always Flashman.", "Flashman. Noone else.")));
+      assertNotNull(chain, d);
+      assertEquals(chain, Arrays.asList("Flashman", "Flashman"), d.getFieldValues("dest_s"));
+
+      // append to existing values
+      d = processAdd(chain,
+          doc(f("id", "1111"),
+              field("dest_s", "orig1", "orig2"),
+              f("source1_s", "Flashman.  And, scene.", "Contemporary Flashman. Yeesh.")));
+      assertNotNull(chain, d);
+      assertEquals(chain, Arrays.asList("orig1", "orig2", "Flashman", "Flashman"), d.getFieldValues("dest_s"));
+    }
+  }
+
+  public void testExtractFieldRegexReplaceAll() throws Exception {
+    SolrInputDocument d = processAdd("extract-regex-replaceall",
+        doc(f("id", "1111"),
+            f("foo_x2_s", "Infrequently Flashman.", "In the words of Flashman."),
+            f("foo_x3_x7_s", "Flashman. Whoa.")));
+
+    assertNotNull(d);
+    assertEquals(Arrays.asList("Flashman", "Flashman"), d.getFieldValues("foo_y2_s"));
+    assertEquals("Flashman", d.getFieldValue("foo_y3_y7_s"));
+  }
+
+  public void testExtractFieldRegexReplaceAllWithEntityType() throws Exception {
+    SolrInputDocument d = processAdd("extract-regex-replaceall-with-entity-type",
+        doc(f("id", "1111"),
+            f("foo_x2_s", "Infrequently Flashman.", "In the words of Flashman."),
+            f("foo_x3_x7_s", "Flashman. Whoa.")));
+
+    assertNotNull(d);
+    assertEquals(d.getFieldNames().toString(), Arrays.asList("Flashman", "Flashman"), d.getFieldValues("foo_person_y2_s"));
+    assertEquals(d.getFieldNames().toString(),"Flashman", d.getFieldValue("foo_person_y3_person_y7_s"));
+  }
+}

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/solr/core/src/test/org/apache/solr/update/processor/UpdateProcessorTestBase.java
----------------------------------------------------------------------
diff --git a/solr/core/src/test/org/apache/solr/update/processor/UpdateProcessorTestBase.java b/solr/core/src/test/org/apache/solr/update/processor/UpdateProcessorTestBase.java
deleted file mode 100644
index d3aa979..0000000
--- a/solr/core/src/test/org/apache/solr/update/processor/UpdateProcessorTestBase.java
+++ /dev/null
@@ -1,168 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.solr.update.processor;
-
-import org.apache.solr.SolrTestCaseJ4;
-import org.apache.solr.common.params.SolrParams;
-import org.apache.solr.common.util.IOUtils;
-import org.apache.solr.common.params.ModifiableSolrParams;
-import org.apache.solr.common.SolrInputDocument;
-import org.apache.solr.common.SolrInputField;
-import org.apache.solr.core.SolrCore;
-import org.apache.solr.request.SolrRequestInfo;
-import org.apache.solr.request.LocalSolrQueryRequest;
-import org.apache.solr.request.SolrQueryRequest;
-import org.apache.solr.response.SolrQueryResponse;
-import org.apache.solr.update.AddUpdateCommand;
-import org.apache.solr.update.CommitUpdateCommand;
-import org.apache.solr.update.DeleteUpdateCommand;
-
-import java.io.IOException;
-
-public class UpdateProcessorTestBase extends SolrTestCaseJ4 {
-
-  /**
-   * Runs a document through the specified chain, and returns the final
-   * document used when the chain is completed (NOTE: some chains may
-   * modify the document in place
-   */
-  protected SolrInputDocument processAdd(final String chain,
-                                         final SolrInputDocument docIn)
-    throws IOException {
-
-    return processAdd(chain, new ModifiableSolrParams(), docIn);
-  }
-
-  /**
-   * Runs a document through the specified chain, and returns the final
-   * document used when the chain is completed (NOTE: some chains may
-   * modify the document in place
-   */
-  protected SolrInputDocument processAdd(final String chain,
-                                         final SolrParams requestParams,
-                                         final SolrInputDocument docIn)
-    throws IOException {
-
-    SolrCore core = h.getCore();
-    UpdateRequestProcessorChain pc = core.getUpdateProcessingChain(chain);
-    assertNotNull("No Chain named: " + chain, pc);
-
-    SolrQueryResponse rsp = new SolrQueryResponse();
-
-    SolrQueryRequest req = new LocalSolrQueryRequest(core, requestParams);
-    try {
-      SolrRequestInfo.setRequestInfo(new SolrRequestInfo(req, rsp));
-      AddUpdateCommand cmd = new AddUpdateCommand(req);
-      cmd.solrDoc = docIn;
-
-      UpdateRequestProcessor processor = pc.createProcessor(req, rsp);
-      if (null != processor) {
-        // test chain might be empty or short circuited.
-        processor.processAdd(cmd);
-      }
-
-      return cmd.solrDoc;
-    } finally {
-      SolrRequestInfo.clearRequestInfo();
-      req.close();
-    }
-  }
-
-  protected void processCommit(final String chain) throws IOException {
-    SolrCore core = h.getCore();
-    UpdateRequestProcessorChain pc = core.getUpdateProcessingChain(chain);
-    assertNotNull("No Chain named: " + chain, pc);
-
-    SolrQueryResponse rsp = new SolrQueryResponse();
-
-    SolrQueryRequest req = new LocalSolrQueryRequest(core, new ModifiableSolrParams());
-
-    CommitUpdateCommand cmd = new CommitUpdateCommand(req,false);
-    UpdateRequestProcessor processor = pc.createProcessor(req, rsp);
-    try {
-      processor.processCommit(cmd);
-    } finally {
-      req.close();
-    }
-  }
-
-  protected void processDeleteById(final String chain, String id) throws IOException {
-    SolrCore core = h.getCore();
-    UpdateRequestProcessorChain pc = core.getUpdateProcessingChain(chain);
-    assertNotNull("No Chain named: " + chain, pc);
-
-    SolrQueryResponse rsp = new SolrQueryResponse();
-
-    SolrQueryRequest req = new LocalSolrQueryRequest(core, new ModifiableSolrParams());
-
-    DeleteUpdateCommand cmd = new DeleteUpdateCommand(req);
-    cmd.setId(id);
-    UpdateRequestProcessor processor = pc.createProcessor(req, rsp);
-    try {
-      processor.processDelete(cmd);
-    } finally {
-      req.close();
-    }
-  }
-
-  protected void finish(final String chain) throws IOException {
-    SolrCore core = h.getCore();
-    UpdateRequestProcessorChain pc = core.getUpdateProcessingChain(chain);
-    assertNotNull("No Chain named: " + chain, pc);
-
-    SolrQueryResponse rsp = new SolrQueryResponse();
-    SolrQueryRequest req = new LocalSolrQueryRequest(core, new ModifiableSolrParams());
-
-    UpdateRequestProcessor processor = pc.createProcessor(req, rsp);
-    try {
-      processor.finish();
-    } finally {
-      IOUtils.closeQuietly(processor);
-      req.close();
-    }
-  }
-
-
-  /**
-   * Convenience method for building up SolrInputDocuments
-   */
-  final SolrInputDocument doc(SolrInputField... fields) {
-    SolrInputDocument d = new SolrInputDocument();
-    for (SolrInputField f : fields) {
-      d.put(f.getName(), f);
-    }
-    return d;
-  }
-
-  /**
-   * Convenience method for building up SolrInputFields
-   */
-  final SolrInputField field(String name, Object... values) {
-    SolrInputField f = new SolrInputField(name);
-    for (Object v : values) {
-      f.addValue(v);
-    }
-    return f;
-  }
-
-  /**
-   * Convenience method for building up SolrInputFields with default boost
-   */
-  final SolrInputField f(String name, Object... values) {
-    return field(name, values);
-  }
-}

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/3e2f9e62/solr/licenses/opennlp-maxent-3.0.3.jar.sha1
----------------------------------------------------------------------
diff --git a/solr/licenses/opennlp-maxent-3.0.3.jar.sha1 b/solr/licenses/opennlp-maxent-3.0.3.jar.sha1
new file mode 100644
index 0000000..c3c412f
--- /dev/null
+++ b/solr/licenses/opennlp-maxent-3.0.3.jar.sha1
@@ -0,0 +1 @@
+55e39e6b46e71f35229cdd6950e72d8cce3b5fd4