You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by ho...@apache.org on 2012/07/31 03:38:42 UTC

svn commit: r1367385 - in /lucene/dev/branches/branch_4x: ./ dev-tools/ lucene/ lucene/analysis/ lucene/analysis/icu/src/java/org/apache/lucene/collation/ lucene/backwards/ lucene/benchmark/ lucene/core/ lucene/demo/ lucene/facet/ lucene/grouping/ luce...

Author: hossman
Date: Tue Jul 31 01:38:40 2012
New Revision: 1367385

URL: http://svn.apache.org/viewvc?rev=1367385&view=rev
Log:
SOLR-3650: checkpoint, migrated CHANGES.txt for contrib/uima and contrib/extraction (merge r1367384)

Added:
    lucene/dev/branches/branch_4x/solr/contrib/extraction/README.txt
      - copied unchanged from r1367384, lucene/dev/trunk/solr/contrib/extraction/README.txt
Removed:
    lucene/dev/branches/branch_4x/solr/contrib/extraction/CHANGES.txt
    lucene/dev/branches/branch_4x/solr/contrib/uima/CHANGES.txt
Modified:
    lucene/dev/branches/branch_4x/   (props changed)
    lucene/dev/branches/branch_4x/dev-tools/   (props changed)
    lucene/dev/branches/branch_4x/lucene/   (props changed)
    lucene/dev/branches/branch_4x/lucene/BUILD.txt   (props changed)
    lucene/dev/branches/branch_4x/lucene/CHANGES.txt   (props changed)
    lucene/dev/branches/branch_4x/lucene/JRE_VERSION_MIGRATION.txt   (props changed)
    lucene/dev/branches/branch_4x/lucene/LICENSE.txt   (props changed)
    lucene/dev/branches/branch_4x/lucene/MIGRATE.txt   (props changed)
    lucene/dev/branches/branch_4x/lucene/NOTICE.txt   (props changed)
    lucene/dev/branches/branch_4x/lucene/README.txt   (props changed)
    lucene/dev/branches/branch_4x/lucene/analysis/   (props changed)
    lucene/dev/branches/branch_4x/lucene/analysis/icu/src/java/org/apache/lucene/collation/ICUCollationKeyFilterFactory.java   (props changed)
    lucene/dev/branches/branch_4x/lucene/backwards/   (props changed)
    lucene/dev/branches/branch_4x/lucene/benchmark/   (props changed)
    lucene/dev/branches/branch_4x/lucene/build.xml   (props changed)
    lucene/dev/branches/branch_4x/lucene/common-build.xml   (props changed)
    lucene/dev/branches/branch_4x/lucene/core/   (props changed)
    lucene/dev/branches/branch_4x/lucene/demo/   (props changed)
    lucene/dev/branches/branch_4x/lucene/facet/   (props changed)
    lucene/dev/branches/branch_4x/lucene/grouping/   (props changed)
    lucene/dev/branches/branch_4x/lucene/highlighter/   (props changed)
    lucene/dev/branches/branch_4x/lucene/ivy-settings.xml   (props changed)
    lucene/dev/branches/branch_4x/lucene/join/   (props changed)
    lucene/dev/branches/branch_4x/lucene/licenses/   (props changed)
    lucene/dev/branches/branch_4x/lucene/memory/   (props changed)
    lucene/dev/branches/branch_4x/lucene/misc/   (props changed)
    lucene/dev/branches/branch_4x/lucene/module-build.xml   (props changed)
    lucene/dev/branches/branch_4x/lucene/queries/   (props changed)
    lucene/dev/branches/branch_4x/lucene/queryparser/   (props changed)
    lucene/dev/branches/branch_4x/lucene/sandbox/   (props changed)
    lucene/dev/branches/branch_4x/lucene/site/   (props changed)
    lucene/dev/branches/branch_4x/lucene/spatial/   (props changed)
    lucene/dev/branches/branch_4x/lucene/suggest/   (props changed)
    lucene/dev/branches/branch_4x/lucene/test-framework/   (props changed)
    lucene/dev/branches/branch_4x/lucene/tools/   (props changed)
    lucene/dev/branches/branch_4x/solr/   (props changed)
    lucene/dev/branches/branch_4x/solr/CHANGES.txt   (contents, props changed)
    lucene/dev/branches/branch_4x/solr/LICENSE.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/NOTICE.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/README.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/build.xml   (props changed)
    lucene/dev/branches/branch_4x/solr/cloud-dev/   (props changed)
    lucene/dev/branches/branch_4x/solr/common-build.xml   (props changed)
    lucene/dev/branches/branch_4x/solr/contrib/   (props changed)
    lucene/dev/branches/branch_4x/solr/contrib/uima/README.txt
    lucene/dev/branches/branch_4x/solr/core/   (props changed)
    lucene/dev/branches/branch_4x/solr/dev-tools/   (props changed)
    lucene/dev/branches/branch_4x/solr/example/   (props changed)
    lucene/dev/branches/branch_4x/solr/lib/   (props changed)
    lucene/dev/branches/branch_4x/solr/licenses/   (props changed)
    lucene/dev/branches/branch_4x/solr/licenses/httpclient-LICENSE-ASL.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/licenses/httpclient-NOTICE.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/licenses/httpcore-LICENSE-ASL.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/licenses/httpcore-NOTICE.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/licenses/httpmime-LICENSE-ASL.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/licenses/httpmime-NOTICE.txt   (props changed)
    lucene/dev/branches/branch_4x/solr/scripts/   (props changed)
    lucene/dev/branches/branch_4x/solr/solrj/   (props changed)
    lucene/dev/branches/branch_4x/solr/test-framework/   (props changed)
    lucene/dev/branches/branch_4x/solr/testlogging.properties   (props changed)
    lucene/dev/branches/branch_4x/solr/webapp/   (props changed)

Modified: lucene/dev/branches/branch_4x/solr/CHANGES.txt
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/CHANGES.txt?rev=1367385&r1=1367384&r2=1367385&view=diff
==============================================================================
--- lucene/dev/branches/branch_4x/solr/CHANGES.txt (original)
+++ lucene/dev/branches/branch_4x/solr/CHANGES.txt Tue Jul 31 01:38:40 2012
@@ -1041,6 +1041,11 @@ New Features
   support numeric collation, ignore punctuation/whitespace, ignore accents but
   not case, control whether upper/lowercase values are sorted first, etc.  (rmuir)
 
+* SOLR-2346: Add a chance to set content encoding explicitly via content type 
+  of stream for extracting request handler.  This is convenient when Tika's 
+  auto detector cannot detect encoding, especially the text file is too short 
+  to detect encoding. (koji)
+
 Optimizations
 ----------------------
 * SOLR-1931: Speedup for LukeRequestHandler and admin/schema browser. New parameter
@@ -1282,6 +1287,12 @@ Other Changes
   repository).  Also updated dependencies jackson-core-asl and
   jackson-mapper-asl (both v1.5.2 -> v1.7.4).  (Dawid Weiss, Steve Rowe)
 
+* SOLR-3295: netcdf jar is excluded from the binary release (and disabled in 
+  ivy.xml) because it requires java 6. If you want to parse this content with 
+  extracting request handler and are willing to use java 6, just add the jar. 
+  (rmuir)
+
+
 Build
 ----------------------
 * SOLR-2487: Add build target to package war without slf4j jars (janhoy)
@@ -1415,6 +1426,9 @@ Bug Fixes
 
 * SOLR-2591: Remove commitLockTimeout option from solrconfig.xml (Luca Cavanna via Martijn van Groningen)
 
+* SOLR-2746: Upgraded UIMA dependencies from *-2.3.1-SNAPSHOT.jar to *-2.3.1.jar.
+
+
 ==================  3.4.0  ==================
 
 Upgrading from Solr 3.3
@@ -1573,6 +1587,9 @@ Bug Fixes
 * SOLR-2629: Eliminate deprecation warnings in some JSPs.
   (Bernd Fehling, hossman)
 
+* SOLR-2743: Remove commons logging from contrib/extraction. (koji)
+
+
 Build
 ----------------------
 
@@ -1644,6 +1661,13 @@ New Features
 
 * SOLR-2610 -- Add an option to delete index through CoreAdmin UNLOAD action (shalin)
 
+* SOLR-2480: Add ignoreTikaException flag to the extraction request handler so 
+  that users can ignore TikaException but index meta data. 
+  (Shinichiro Abe, koji)
+
+* SOLR-2582: Use uniqueKey for error log in UIMAUpdateRequestProcessor.
+  (Tommaso Teofili via koji)
+
 Optimizations
 ----------------------
 
@@ -1663,6 +1687,12 @@ Bug Fixes
   parameter is added to avoid excessive CPU time in extreme cases (e.g. long
   queries with many misspelled words).  (James Dyer via rmuir)
 
+* SOLR-2579: UIMAUpdateRequestProcessor ignore error fails if text.length() < 100.
+  (Elmer Garduno via koji)
+
+* SOLR-2581: UIMAToSolrMapper wrongly instantiates Type with reflection.
+  (Tommaso Teofili via koji)
+
 Other Changes
 ----------------------
 
@@ -1702,6 +1732,10 @@ Upgrading from Solr 3.1
   with update.chain rather than update.processor. The latter still works,
   but has been deprecated.
 
+* <uimaConfig/> just beneath <config> ... </config> is no longer supported.
+  It should move to UIMAUpdateRequestProcessorFactory setting.
+  See contrib/uima/README.txt for more details. (SOLR-2436)
+
 Detailed Change List
 ----------------------
 
@@ -1728,6 +1762,12 @@ New Features
   for clustering (SOLR-2450), output of cluster scores (SOLR-2505)
   (Stanislaw Osinski, Dawid Weiss).
 
+* SOLR-2503: extend UIMAUpdateRequestProcessorFactory mapping function to 
+  map feature value to dynamicField. (koji)
+
+* SOLR-2512: add ignoreErrors flag to UIMAUpdateRequestProcessorFactory so 
+  that users can ignore exceptions in AE. (Tommaso Teofili, koji)
+
 Optimizations
 ----------------------
 
@@ -1814,6 +1854,12 @@ Other Changes
 * SOLR-2528: Remove default="true" from HtmlEncoder in example solrconfig.xml,
   because html encoding confuses non-ascii users. (koji)
 
+* SOLR-2387: add mock annotators for improved testing in contrib/uima,
+  (Tommaso Teofili via rmuir)
+
+* SOLR-2436: move uimaConfig to under the uima's update processor in 
+  solrconfig.xml.  (Tommaso Teofili, koji)
+
 Build
 ----------------------
 
@@ -2373,6 +2419,12 @@ Bug Fixes
 * SOLR-1692: Fix bug in clustering component relating to carrot.produceSummary 
   option (gsingers)
 
+* SOLR-1756: The date.format setting for extraction request handler causes 
+  ClassCastException when enabled and the config code that parses this setting 
+  does not properly use the same iterator instance. 
+  (Christoph Brill, Mark Miller)
+
+
 Other Changes
 ----------------------
 
@@ -2500,6 +2552,10 @@ Other Changes
 * SOLR-141: Errors and Exceptions are formated by ResponseWriter.
   (Mike Sokolov, Rich Cariens, Daniel Naber, ryan)
 
+* SOLR-1902: Upgraded to Tika 0.8 and changed deprecated parse call
+
+* SOLR-1813: Add ICU4j to contrib/extraction libs and add tests for Arabic 
+  extraction (Robert Muir via gsingers)
 
 Build
 ----------------------
@@ -2870,6 +2926,11 @@ New Features
 84. SOLR-1449: Add <lib> elements to solrconfig.xml to specifying additional
     classpath directories and regular expressions. (hossman via yonik)
 
+85. SOLR-1128: Added metadata output to extraction request handler "extract 
+    only" option.  (gsingers)
+
+86. SOLR-1274: Added text serialization output for extractOnly 
+    (Peter Wolanin, gsingers)  
 
 Optimizations
 ----------------------
@@ -3284,6 +3345,14 @@ Other Changes
 
 50. SOLR-1357 SolrInputDocument cannot process dynamic fields (Lars Grote via noble)
 
+51. SOLR-1075: Upgrade to Tika 0.3.  See http://www.apache.org/dist/lucene/tika/CHANGES-0.3.txt (gsingers)
+
+52. SOLR-1310: Upgrade to Tika 0.4. Note there are some differences in 
+    detecting Languages now in extracting request handler.
+    See http://www.lucidimagination.com/search/document/d6f1899a85b2a45c/vote_apache_tika_0_4_release_candidate_2#d6f1899a85b2a45c
+    for discussion on language detection.
+    See http://www.apache.org/dist/lucene/tika/CHANGES-0.4.txt. (gsingers)
+
 Build
 ----------------------
  1. SOLR-776: Added in ability to sign artifacts via Ant for releases (gsingers)

Modified: lucene/dev/branches/branch_4x/solr/contrib/uima/README.txt
URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/contrib/uima/README.txt?rev=1367385&r1=1367384&r2=1367385&view=diff
==============================================================================
--- lucene/dev/branches/branch_4x/solr/contrib/uima/README.txt (original)
+++ lucene/dev/branches/branch_4x/solr/contrib/uima/README.txt Tue Jul 31 01:38:40 2012
@@ -1,3 +1,15 @@
+Apache Solr UIMA Metadata Extraction Library
+
+Introduction
+------------
+This module is intended to be used both as an UpdateRequestProcessor while indexing documents and as a set of tokenizer/filters
+to be configured inside the schema.xml for use during analysis phase.
+UIMAUpdateRequestProcessor purpose is to provide additional on the fly automatically generated fields to the Solr index.
+Such fields could be language, concepts, keywords, sentences, named entities, etc.
+UIMA based tokenizers/filters can be used either inside plain Lucene or as index/query analyzers to be defined
+inside the schema.xml of a Solr core to create/filter tokens using specific UIMA annotations.
+
+
 Getting Started
 ---------------
 To start using Solr UIMA Metadata Extraction Library you should go through the following configuration steps: