You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-commits@lucene.apache.org by rm...@apache.org on 2009/11/17 20:55:02 UTC

svn commit: r881466 - /lucene/java/trunk/JRE_VERSION_MIGRATION.txt

Author: rmuir
Date: Tue Nov 17 19:55:02 2009
New Revision: 881466

URL: http://svn.apache.org/viewvc?rev=881466&view=rev
Log:
LUCENE-2073: add info about upgrading JVM,unicode

Added:
    lucene/java/trunk/JRE_VERSION_MIGRATION.txt   (with props)

Added: lucene/java/trunk/JRE_VERSION_MIGRATION.txt
URL: http://svn.apache.org/viewvc/lucene/java/trunk/JRE_VERSION_MIGRATION.txt?rev=881466&view=auto
==============================================================================
--- lucene/java/trunk/JRE_VERSION_MIGRATION.txt (added)
+++ lucene/java/trunk/JRE_VERSION_MIGRATION.txt Tue Nov 17 19:55:02 2009
@@ -0,0 +1,36 @@
+If possible, use the same JRE major version at both index and search time.
+When upgrading to a different JRE major version, consider re-indexing. 
+
+Different JRE major versions may implement different versions of Unicode,
+which will change the way some parts of Lucene treat your text.
+
+For example: with Java 1.4, LetterTokenizer will split around the character U+02C6,
+but with Java 5 it will not.
+This is because Java 1.4 implements Unicode 3, but Java 5 implements Unicode 4.
+
+For reference, JRE major versions with their corresponding Unicode versions:
+Java 1.4, Unicode 3.0
+Java 5, Unicode 4.0
+Java 6, Unicode 4.0
+Java 7, Unicode 5.1
+
+In general, whether or not you need to re-index largely depends upon the data that
+you are searching, and what was changed in any given Unicode version. For example, 
+if you are completely sure that your content is limited to the "Basic Latin" range 
+of Unicode, you can safely ignore this. 
+
+Special Notes:
+
+LUCENE 2.9 TO 3.0, JAVA 1.4 TO JAVA 5 TRANSITION
+
+* StandardAnalyzer will return the same results under Java 5 as it did under 
+Java 1.4. This is because it is largely independent of the runtime JRE for
+Unicode support, (with the exception of lowercasing).  However, no changes to 
+casing have occurred in Unicode 4.0 that affect StandardAnalyzer, so if you are 
+using this Analyzer you are NOT affected.
+
+* SimpleAnalyzer, StopAnalyzer, LetterTokenizer, LowerCaseFilter, and 
+LowerCaseTokenizer may return different results, along with many other Analyzers
+and TokenStreams in Lucene's contrib area. If you are using one of these 
+components, you may be affected.
+

Propchange: lucene/java/trunk/JRE_VERSION_MIGRATION.txt
------------------------------------------------------------------------------
    svn:eol-style = native