You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by sa...@apache.org on 2011/05/22 23:45:45 UTC

svn commit: r1126234 [3/28] - in /lucene/dev/branches/solr2452: ./ dev-tools/eclipse/ dev-tools/idea/ dev-tools/idea/.idea/ dev-tools/idea/lucene/ dev-tools/idea/lucene/contrib/ant/ dev-tools/idea/lucene/contrib/db/bdb-je/ dev-tools/idea/lucene/contrib...

Modified: lucene/dev/branches/solr2452/lucene/CHANGES.txt
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/CHANGES.txt?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/CHANGES.txt (original)
+++ lucene/dev/branches/solr2452/lucene/CHANGES.txt Sun May 22 21:45:19 2011
@@ -141,6 +141,11 @@ Changes in backwards compatibility polic
 * LUCENE-2315: AttributeSource's methods for accessing attributes are now final,
   else its easy to corrupt the internal states.  (Uwe Schindler)
 
+* LUCENE-2814: The IndexWriter.flush method no longer takes "boolean
+  flushDocStores" argument, as we now always flush doc stores (index
+  files holding stored fields and term vectors) while flushing a
+  segment.  (Mike McCandless)
+
 Changes in Runtime Behavior
 
 * LUCENE-2846: omitNorms now behaves like omitTermFrequencyAndPositions, if you
@@ -157,18 +162,76 @@ Changes in Runtime Behavior
 * LUCENE-2720: IndexWriter throws IndexFormatTooOldException on open, rather 
   than later when e.g. a merge starts. (Shai Erera, Mike McCandless, Uwe Schindler)
 
-* LUCENE-1076: The default merge policy is now able to merge
-  non-contiguous segments, which means docIDs no longer necessarily
-  say "in order".  If this is a problem then you can use either of the
-  LogMergePolicy impls, and call setRequireContiguousMerge(true).
-  (Mike McCandless)
-  
 * LUCENE-2881: FieldInfos is now tracked per segment.  Before it was tracked
   per IndexWriter session, which resulted in FieldInfos that had the FieldInfo
   properties from all previous segments combined. Field numbers are now tracked
   globally across IndexWriter sessions and persisted into a X.fnx file on
   successful commit. The corresponding file format changes are backwards-
   compatible. (Michael Busch, Simon Willnauer)
+  
+* LUCENE-2956, LUCENE-2573, LUCENE-2324, LUCENE-2555: Changes from 
+  DocumentsWriterPerThread:
+
+  - IndexWriter now uses a DocumentsWriter per thread when indexing documents.
+    Each DocumentsWriterPerThread indexes documents in its own private segment,
+    and the in memory segments are no longer merged on flush.  Instead, each
+    segment is separately flushed to disk and subsequently merged with normal
+    segment merging.
+
+  - DocumentsWriterPerThread (DWPT) is now flushed concurrently based on a
+    FlushPolicy.  When a DWPT is flushed, a fresh DWPT is swapped in so that
+    indexing may continue concurrently with flushing.  The selected
+    DWPT flushes all its RAM resident documents do disk.  Note: Segment flushes
+    don't flush all RAM resident documents but only the documents private to
+    the DWPT selected for flushing. 
+  
+  - Flushing is now controlled by FlushPolicy that is called for every add,
+    update or delete on IndexWriter. By default DWPTs are flushed either on
+    maxBufferedDocs per DWPT or the global active used memory. Once the active
+    memory exceeds ramBufferSizeMB only the largest DWPT is selected for
+    flushing and the memory used by this DWPT is substracted from the active
+    memory and added to a flushing memory pool, which can lead to temporarily
+    higher memory usage due to ongoing indexing.
+    
+  - IndexWriter now can utilize ramBufferSize > 2048 MB. Each DWPT can address
+    up to 2048 MB memory such that the ramBufferSize is now bounded by the max
+    number of DWPT avaliable in the used DocumentsWriterPerThreadPool.
+    IndexWriters net memory consumption can grow far beyond the 2048 MB limit if
+    the applicatoin can use all available DWPTs. To prevent a DWPT from
+    exhausting its address space IndexWriter will forcefully flush a DWPT if its
+    hard memory limit is exceeded. The RAMPerThreadHardLimitMB can be controlled
+    via IndexWriterConfig and defaults to 1945 MB. 
+    Since IndexWriter flushes DWPT concurrently not all memory is released
+    immediately. Applications should still use a ramBufferSize significantly
+    lower than the JVMs avaliable heap memory since under high load multiple
+    flushing DWPT can consume substantial transient memory when IO performance
+    is slow relative to indexing rate.
+    
+  - IndexWriter#commit now doesn't block concurrent indexing while flushing all
+    'currently' RAM resident documents to disk. Yet, flushes that occur while a
+    a full flush is running are queued and will happen after all DWPT involved
+    in the full flush are done flushing. Applications using multiple threads
+    during indexing and trigger a full flush (eg call commmit() or open a new
+    NRT reader) can use significantly more transient memory.
+    
+  - IndexWriter#addDocument and IndexWriter.updateDocument can block indexing
+    threads if the number of active + number of flushing DWPT exceed a
+    safety limit. By default this happens if 2 * max number available thread
+    states (DWPTPool) is exceeded. This safety limit prevents applications from
+    exhausting their available memory if flushing can't keep up with
+    concurrently indexing threads.  
+    
+  - IndexWriter only applies and flushes deletes if the maxBufferedDelTerms
+    limit is reached during indexing. No segment flushes will be triggered
+    due to this setting.
+    
+  - IndexWriter#flush(boolean, boolean) doesn't synchronized on IndexWriter
+    anymore. A dedicated flushLock has been introduced to prevent multiple full-
+    flushes happening concurrently. 
+    
+  - DocumentsWriter doesn't write shared doc stores anymore. 
+  
+  (Mike McCandless, Michael Busch, Simon Willnauer)
 
 API Changes
 
@@ -267,9 +330,6 @@ New features
 * LUCENE-2489: Added PerFieldCodecWrapper (in oal.index.codecs) which
   lets you set the Codec per field (Mike McCandless)
 
-* LUCENE-1810: Added FieldSelectorResult.LATENT to not cache lazy loaded fields
-  (Tim Smith, Grant Ingersoll)
-
 * LUCENE-2373: Extend CodecProvider to use SegmentInfosWriter and
   SegmentInfosReader to allow customization of SegmentInfos data.
   (Andrzej Bialecki)
@@ -299,9 +359,6 @@ New features
   use MultiFields static methods directly, instead) if you need to use
   the flex APIs directly on a composite reader.  (Mike McCandless)
 
-* LUCENE-2692: Added several new SpanQuery classes for positional checking
-  (match is in a range, payload is a specific value) (Grant Ingersoll)  
-  
 * LUCENE-2690: MultiTermQuery boolean rewrites per segment.
   (Uwe Schindler, Robert Muir, Mike McCandless, Simon Willnauer)
 
@@ -333,10 +390,14 @@ New features
 
 * LUCENE-2862: Added TermsEnum.totalTermFreq() and
   Terms.getSumTotalTermFreq().  (Mike McCandless, Robert Muir)
-  
-* LUCENE-3001: Added TrieFieldHelper to write solr compatible numeric
-  fields without the solr dependency. (ryan)
-  
+
+* LUCENE-3003: Added new expert class oal.index.DocTermsOrd,
+  refactored from Solr's UnInvertedField, for accessing term ords for
+  multi-valued fields, per document.  This is similar to FieldCache in
+  that it inverts the index to compute the ords, but differs in that
+  it's able to handle multi-valued fields and does not hold the term
+  bytes in RAM. (Mike McCandless)
+
 Optimizations
 
 * LUCENE-2588: Don't store unnecessary suffixes when writing the terms
@@ -366,38 +427,126 @@ Bug fixes
   with more document deletions is requested before a reader with fewer
   deletions, provided they share some segments. (yonik)
 
-* LUCENE-2936: PhraseQuery score explanations were not correctly 
-  identifying matches vs non-matches.  (hossman)
+======================= Lucene 3.x (not yet released) =======================
 
-* LUCENE-2996: addIndexes(IndexReader) did not flush before adding the new 
-  indexes, causing existing deletions to be applied on the incoming indexes as 
-  well. (Shai Erera, Mike McCandless)
+Changes in backwards compatibility policy
 
-Test Cases
+* LUCENE-2953: PriorityQueue's internal heap was made private, as subclassing
+  with generics can lead to ClassCastException. For advanced use (e.g. in Solr)
+  a method getHeapArray() was added to retrieve the internal heap array as a
+  non-generic Object[].  (Uwe Schindler, Yonik Seeley)
 
-* LUCENE-3002: added 'tests.iter.min' to control 'tests.iter' by allowing to 
-  stop iterating if at least 'tests.iter.min' ran and a failure occured. 
-  (Shai Erera, Chris Hostetter)
+* LUCENE-1076: IndexWriter.setInfoStream now throws IOException
+  (Mike McCandless, Shai Erera)
 
-Build
+* LUCENE-3084: MergePolicy.OneMerge.segments was changed from
+  SegmentInfos to a List<SegmentInfo>. SegmentInfos itsself was changed
+  to no longer extend Vector<SegmentInfo> (to update code that is using
+  Vector-API, use the new asList() and asSet() methods returning unmodifiable
+  collections; modifying SegmentInfos is now only possible through
+  the explicitely declared methods). IndexWriter.segString() now takes
+  Iterable<SegmentInfo> instead of List<SegmentInfo>. A simple recompile
+  should fix this. MergePolicy and SegmentInfos are internal/experimental
+  APIs not covered by the strict backwards compatibility policy.
+  (Uwe Schindler, Mike McCandless)
 
-* LUCENE-3006: Building javadocs will fail on warnings by default.  Override with -Dfailonjavadocwarning=false (sarowe, gsingers)
+Changes in runtime behavior
 
+* LUCENE-3065: When a NumericField is retrieved from a Document loaded
+  from IndexReader (or IndexSearcher), it will now come back as
+  NumericField not as a Field with a string-ified version of the
+  numeric value you had indexed.  Note that this only applies for
+  newly-indexed Documents; older indices will still return Field
+  with the string-ified numeric value. If you call Document.get(),
+  the value comes still back as String, but Document.getFieldable()
+  returns NumericField instances. (Uwe Schindler, Ryan McKinley,
+  Mike McCandless)
 
-======================= Lucene 3.x (not yet released) =======================
+* LUCENE-1076: Changed the default merge policy from
+  LogByteSizeMergePolicy to TieredMergePolicy, as of Version.LUCENE_32
+  (passed to IndexWriterConfig), which is able to merge non-contiguous
+  segments. This means docIDs no longer necessarily stay "in order"
+  during indexing.  If this is a problem then you can use either of
+  the LogMergePolicy impls.  (Mike McCandless)
+  
+New features
 
-Changes in backwards compatibility policy
+* LUCENE-3082: Added index upgrade tool oal.index.IndexUpgrader
+  that allows to upgrade all segments to last recent supported index
+  format without fully optimizing.  (Uwe Schindler, Mike McCandless)
+
+* LUCENE-1076: Added TieredMergePolicy which is able to merge non-contiguous 
+  segments, which means docIDs no longer necessarily stay "in order".
+  (Mike McCandless, Shai Erera)
+
+* LUCENE-3071: Adding ReversePathHierarchyTokenizer, added skip parameter to 
+  PathHierarchyTokenizer (Olivier Favre via ryan)
+
+* LUCENE-1421, LUCENE-3102: added CachingCollector which allow you to cache 
+  document IDs and scores encountered during the search, and "replay" them to 
+  another Collector. (Mike McCandless, Shai Erera)
 
-* LUCENE-2953: PriorityQueue's internal heap was made private, as subclassing
-  with generics can lead to ClassCastException. For advanced use (e.g. in Solr)
-  a method getHeapArray() was added to retrieve the internal heap array as a
-  non-generic Object[].  (Uwe Schindler, Yonik Seeley)
+API Changes
+
+* LUCENE-3061: IndexWriter's getNextMerge() and merge(OneMerge) are now public
+  (though @lucene.experimental), allowing for custom MergeScheduler 
+  implementations. (Shai Erera)
+
+* LUCENE-3065: Document.getField() was deprecated, as it throws
+  ClassCastException when loading lazy fields or NumericFields.
+  (Uwe Schindler, Ryan McKinley, Mike McCandless)
+
+* LUCENE-2027: Directory.touchFile is deprecated and will be removed
+  in 4.0.  (Mike McCandless)
 
 Optimizations
 
 * LUCENE-2990: ArrayUtil/CollectionUtil.*Sort() methods now exit early
   on empty or one-element lists/arrays.  (Uwe Schindler)
 
+* LUCENE-2897: Apply deleted terms while flushing a segment.  We still
+  buffer deleted terms to later apply to past segments.  (Mike McCandless)
+
+Bug fixes
+
+* LUCENE-2996: addIndexes(IndexReader) did not flush before adding the new 
+  indexes, causing existing deletions to be applied on the incoming indexes as 
+  well. (Shai Erera, Mike McCandless)
+
+* LUCENE-3024: Index with more than 2.1B terms was hitting AIOOBE when
+  seeking TermEnum (eg used by Solr's faceting) (Tom Burton-West, Mike
+  McCandless)
+
+* LUCENE-3042: When a filter or consumer added Attributes to a TokenStream
+  chain after it was already (partly) consumed [or clearAttributes(),
+  captureState(), cloneAttributes(),... was called by the Tokenizer],
+  the Tokenizer calling clearAttributes() or capturing state after addition 
+  may not do this on the newly added Attribute. This bug affected only
+  very special use cases of the TokenStream-API, most users would not
+  have recognized it.  (Uwe Schindler, Robert Muir)
+
+* LUCENE-3054: PhraseQuery can in some cases stack overflow in
+  SorterTemplate.quickSort(). This fix also adds an optimization to
+  PhraseQuery as term with lower doc freq will also have less positions.
+  (Uwe Schindler, Robert Muir, Otis Gospodnetic)
+
+* LUCENE-3068: sloppy phrase query failed to match valid documents when multiple 
+  query terms had same position in the query. (Doron Cohen)
+
+* LUCENE-3012: Lucene writes the header now for separate norm files (*.sNNN)
+  (Robert Muir)
+
+Build
+
+* LUCENE-3006: Building javadocs will fail on warnings by default. 
+  Override with -Dfailonjavadocwarning=false (sarowe, gsingers)
+
+Test Cases
+
+* LUCENE-3002: added 'tests.iter.min' to control 'tests.iter' by allowing to 
+  stop iterating if at least 'tests.iter.min' ran and a failure occured. 
+  (Shai Erera, Chris Hostetter)
+
 ======================= Lucene 3.1.0 =======================
 
 Changes in backwards compatibility policy
@@ -936,6 +1085,12 @@ New features
 
 * LUCENE-2913: Add missing getters to Numeric* classes. (Uwe Schindler)
 
+* LUCENE-1810: Added FieldSelectorResult.LATENT to not cache lazy loaded fields
+  (Tim Smith, Grant Ingersoll)
+
+* LUCENE-2692: Added several new SpanQuery classes for positional checking
+  (match is in a range, payload is a specific value) (Grant Ingersoll)  
+
 Optimizations
 
 * LUCENE-2494: Use CompletionService in ParallelMultiSearcher instead of
@@ -1379,6 +1534,10 @@ Bug fixes
   that warming is free to do whatever it needs to.  (Earwin Burrfoot
   via Mike McCandless)
 
+* LUCENE-3029: Fix corner case when MultiPhraseQuery is used with zero
+  position-increment tokens that would sometimes assign different
+  scores to identical docs.  (Mike McCandless)
+
 * LUCENE-2486: Fixed intermittent FileNotFoundException on doc store
   files when a mergedSegmentWarmer is set on IndexWriter.  (Mike
   McCandless)

Modified: lucene/dev/branches/solr2452/lucene/MIGRATE.txt
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/MIGRATE.txt?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/MIGRATE.txt (original)
+++ lucene/dev/branches/solr2452/lucene/MIGRATE.txt Sun May 22 21:45:19 2011
@@ -312,6 +312,8 @@ LUCENE-1458, LUCENE-2111: Flexible Index
     - o.a.l.analysis.ReusableAnalyzerBase -> o.a.l.analysis.util.ReusableAnalyzerBase
     - o.a.l.analysis.StopwordAnalyzerBase -> o.a.l.analysis.util.StopwordAnalyzerBase
     - o.a.l.analysis.WordListLoader -> o.a.l.analysis.util.WordListLoader
+    - o.a.l.analysis.CharTokenizer -> o.a.l.analysis.util.CharTokenizer
+    - o.a.l.util.CharacterUtils -> o.a.l.analysis.util.CharacterUtils
 
 * LUCENE-2514: The option to use a Collator's order (instead of binary order) for
   sorting and range queries has been moved to contrib/queries.
@@ -356,3 +358,9 @@ LUCENE-1458, LUCENE-2111: Flexible Index
   field as a parameter, this is removed due to the fact the entire Similarity (all methods)
   can now be configured per-field.
   Methods that apply to the entire query such as coord() and queryNorm() exist in SimilarityProvider.
+
+* LUCENE-1076: TieredMergePolicy is now the default merge policy.
+  It's able to merge non-contiguous segments; this may cause problems
+  for applications that rely on Lucene's internal document ID
+  assigment.  If so, you should instead use LogByteSize/DocMergePolicy
+  during indexing.

Modified: lucene/dev/branches/solr2452/lucene/common-build.xml
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/common-build.xml?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/common-build.xml (original)
+++ lucene/dev/branches/solr2452/lucene/common-build.xml Sun May 22 21:45:19 2011
@@ -73,6 +73,7 @@
   </condition>
   <property name="tests.multiplier" value="1" />
   <property name="tests.codec" value="randomPerField" />
+  <property name="tests.codecprovider" value="random" />
   <property name="tests.locale" value="random" />
   <property name="tests.timezone" value="random" />
   <property name="tests.directory" value="random" />
@@ -308,7 +309,7 @@
     </copy>
   </target>
 
-  <target name="compile" depends="compile-core, validate-lucene">
+  <target name="compile" depends="compile-core">
     <!-- convenience target to compile core -->
   </target>
 
@@ -499,6 +500,8 @@
 	      <sysproperty key="tests.verbose" value="${tests.verbose}"/>
               <!-- set the codec tests should run with -->
 	      <sysproperty key="tests.codec" value="${tests.codec}"/>
+              <!-- set the codec provider tests should run with -->
+	      <sysproperty key="tests.codecprovider" value="${tests.codecprovider}"/>
               <!-- set the locale tests should run with -->
 	      <sysproperty key="tests.locale" value="${tests.locale}"/>
               <!-- set the timezone tests should run with -->
@@ -568,7 +571,7 @@
   	</sequential>
   </macrodef>
 	
-  <target name="test" depends="compile-test,junit-mkdir,junit-sequential,junit-parallel" description="Runs unit tests"/>
+  <target name="test" depends="compile-test,validate-lucene,junit-mkdir,junit-sequential,junit-parallel" description="Runs unit tests"/>
 
   <target name="junit-mkdir">
   	<mkdir dir="${junit.output.dir}"/>

Modified: lucene/dev/branches/solr2452/lucene/contrib/CHANGES.txt
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/CHANGES.txt?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/CHANGES.txt (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/CHANGES.txt Sun May 22 21:45:19 2011
@@ -42,10 +42,68 @@ API Changes
    Instead, use SimilarityProvider to return different SweetSpotSimilaritys
    for different fields, this way all parameters (such as TF factors) can be 
    customized on a per-field basis.  (Robert Muir)
+   
+Bug Fixes
+
+ * LUCENE-3045: fixed QueryNodeImpl.containsTag(String key) that was
+   not lowercasing the key before checking for the tag (Adriano Crestani)
 
 ======================= Lucene 3.x (not yet released) =======================
 
-(No changes)
+Changes in runtime behavior
+
+ * LUCENE-3086: ItalianAnalyzer now uses ElisionFilter with a set of Italian
+   contractions by default.  (Robert Muir)
+
+Bug Fixes
+
+ * LUCENE-3045: fixed QueryNodeImpl.containsTag(String key) that was
+   not lowercasing the key before checking for the tag (Adriano Crestani)
+
+ * LUCENE-3026: SmartChineseAnalyzer's WordTokenFilter threw NullPointerException
+   on sentences longer than 32,767 characters.  (wangzhenghang via Robert Muir)
+   
+ * LUCENE-2939: Highlighter should try and use maxDocCharsToAnalyze in 
+   WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as 
+   when using CachingTokenStream. This can be a significant performance bug for
+   large documents. (Mark Miller)
+
+ * LUCENE-3043: GermanStemmer threw IndexOutOfBoundsException if it encountered
+   a zero-length token.  (Robert Muir)
+   
+ * LUCENE-3044: ThaiWordFilter didn't reset its cached state correctly, this only
+   caused a problem if you consumed a tokenstream, then reused it, added different
+   attributes to it, and consumed it again.  (Robert Muir, Uwe Schindler)
+
+ * LUCENE-3113: Fixed some minor analysis bugs: double-reset() in ReusableAnalyzerBase
+   and ShingleAnalyzerWrapper, missing end() implementations in PrefixAwareTokenFilter
+   and PrefixAndSuffixAwareTokenFilter, invocations of incrementToken() after it
+   already returned false in CommonGramsQueryFilter, HyphenatedWordsFilter,
+   ShingleFilter, and SynonymsFilter.  (Robert Muir, Steven Rowe, Uwe Schindler)
+
+New Features
+
+ * LUCENE-3016: Add analyzer for Latvian.  (Robert Muir)
+
+ * LUCENE-1421: create new grouping contrib module, enabling search
+   results to be grouped by a single-valued indexed field.  This
+   module was factored out of Solr's grouping implementation, but
+   it cannot group by function queries nor arbitrary queries.  (Mike
+   McCandless)
+
+ * LUCENE-3098: add AllGroupsCollector, to collect all unique groups
+   (but in unspecified order).  (Martijn van Groningen via Mike
+   McCandless)
+
+ * LUCENE-3092: Added NRTCachingDirectory in contrib/misc, which
+   caches small segments in RAM.  This is useful, in the near-real-time
+   case where the indexing rate is lowish but the reopen rate is
+   highish, to take load off the IO system.  (Mike McCandless)
+
+Optimizations
+
+ * LUCENE-3040: Switch all analysis consumers (highlighter, morelikethis, memory, ...)
+   over to reusableTokenStream().  (Robert Muir)
 
 ======================= Lucene 3.1.0 =======================
 
@@ -156,6 +214,10 @@ Bug fixes
  * LUCENE-2943: Fix thread-safety issues with ICUCollationKeyFilter.
    (Robert Muir)
 
+ * LUCENE-3087: Highlighter: fix case that was preventing highlighting
+   of exact phrase when tokens overlap. (Pierre Gossé via Mike
+   McCandless)
+
 API Changes
 
  * LUCENE-2867: Some contrib queryparser methods that receives CharSequence as

Modified: lucene/dev/branches/solr2452/lucene/contrib/ant/src/java/org/apache/lucene/ant/IndexTask.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/ant/src/java/org/apache/lucene/ant/IndexTask.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/ant/src/java/org/apache/lucene/ant/IndexTask.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/ant/src/java/org/apache/lucene/ant/IndexTask.java Sun May 22 21:45:19 2011
@@ -39,7 +39,7 @@ import org.apache.lucene.document.Docume
 import org.apache.lucene.document.Field;
 import org.apache.lucene.index.IndexWriter;
 import org.apache.lucene.index.IndexWriterConfig;
-import org.apache.lucene.index.LogMergePolicy;
+import org.apache.lucene.index.TieredMergePolicy;
 import org.apache.lucene.index.Term;
 import org.apache.lucene.index.IndexWriterConfig.OpenMode;
 import org.apache.lucene.search.IndexSearcher;
@@ -285,9 +285,9 @@ public class IndexTask extends Task {
       IndexWriterConfig conf = new IndexWriterConfig(
           Version.LUCENE_CURRENT, analyzer).setOpenMode(
           create ? OpenMode.CREATE : OpenMode.APPEND);
-      LogMergePolicy lmp = (LogMergePolicy) conf.getMergePolicy();
-      lmp.setUseCompoundFile(useCompoundIndex);
-      lmp.setMergeFactor(mergeFactor);
+      TieredMergePolicy tmp = (TieredMergePolicy) conf.getMergePolicy();
+      tmp.setUseCompoundFile(useCompoundIndex);
+      tmp.setMaxMergeAtOnce(mergeFactor);
       IndexWriter writer = new IndexWriter(dir, conf);
       int totalFiles = 0;
       int totalIndexed = 0;

Modified: lucene/dev/branches/solr2452/lucene/contrib/ant/src/test/org/apache/lucene/ant/HtmlDocumentTest.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/ant/src/test/org/apache/lucene/ant/HtmlDocumentTest.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/ant/src/test/org/apache/lucene/ant/HtmlDocumentTest.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/ant/src/test/org/apache/lucene/ant/HtmlDocumentTest.java Sun May 22 21:45:19 2011
@@ -17,8 +17,6 @@ package org.apache.lucene.ant;
  * limitations under the License.
  */
 
-import java.io.IOException;
-
 import org.apache.lucene.ant.DocumentTestCase;
 import org.apache.lucene.ant.HtmlDocument;
 
@@ -27,7 +25,8 @@ public class HtmlDocumentTest extends Do
     HtmlDocument doc;
     
     @Override
-    public void setUp() throws IOException {
+    public void setUp() throws Exception {
+        super.setUp();
         doc = new HtmlDocument(getFile("test.html"));
     }
     
@@ -37,8 +36,9 @@ public class HtmlDocumentTest extends Do
     }
     
     @Override
-    public void tearDown() {
+    public void tearDown() throws Exception {
         doc = null;
+        super.tearDown();
     }
 }
 

Modified: lucene/dev/branches/solr2452/lucene/contrib/ant/src/test/org/apache/lucene/ant/TextDocumentTest.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/ant/src/test/org/apache/lucene/ant/TextDocumentTest.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/ant/src/test/org/apache/lucene/ant/TextDocumentTest.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/ant/src/test/org/apache/lucene/ant/TextDocumentTest.java Sun May 22 21:45:19 2011
@@ -17,8 +17,6 @@ package org.apache.lucene.ant;
  * limitations under the License.
  */
 
-import java.io.IOException;
-
 import org.apache.lucene.ant.DocumentTestCase;
 import org.apache.lucene.ant.TextDocument;
 
@@ -27,7 +25,8 @@ public class TextDocumentTest extends Do
     TextDocument doc;
     
     @Override
-    public void setUp() throws IOException {
+    public void setUp() throws Exception {
+        super.setUp();
         doc = new TextDocument(getFile("test.txt"));
     }
     
@@ -36,8 +35,9 @@ public class TextDocumentTest extends Do
     }
     
     @Override
-    public void tearDown() {
+    public void tearDown() throws Exception {
         doc = null;
+        super.tearDown();
     }
 }
 

Modified: lucene/dev/branches/solr2452/lucene/contrib/db/bdb-je/src/java/org/apache/lucene/store/je/JEDirectory.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/db/bdb-je/src/java/org/apache/lucene/store/je/JEDirectory.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/db/bdb-je/src/java/org/apache/lucene/store/je/JEDirectory.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/db/bdb-je/src/java/org/apache/lucene/store/je/JEDirectory.java Sun May 22 21:45:19 2011
@@ -199,17 +199,6 @@ public class JEDirectory extends Directo
         return new JELock();
     }
 
-    @Override
-    public void touchFile(String name) throws IOException {
-        File file = new File(name);
-        long length = 0L;
-
-        if (file.exists(this))
-            length = file.getLength();
-
-        file.modify(this, length, System.currentTimeMillis());
-    }
-
     /**
      * Once a transaction handle was committed it is no longer valid. In order
      * to continue using this JEDirectory instance after a commit, the

Modified: lucene/dev/branches/solr2452/lucene/contrib/db/bdb/src/java/org/apache/lucene/store/db/DbDirectory.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/db/bdb/src/java/org/apache/lucene/store/db/DbDirectory.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/db/bdb/src/java/org/apache/lucene/store/db/DbDirectory.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/db/bdb/src/java/org/apache/lucene/store/db/DbDirectory.java Sun May 22 21:45:19 2011
@@ -222,19 +222,6 @@ public class DbDirectory extends Directo
         return new DbLock();
     }
 
-    @Override
-    public void touchFile(String name)
-        throws IOException
-    {
-        File file = new File(name);
-        long length = 0L;
-
-        if (file.exists(this))
-            length = file.getLength();
-
-        file.modify(this, length, System.currentTimeMillis());
-    }
-
     /**
      * Once a transaction handle was committed it is no longer valid. In
      * order to continue using this DbDirectory instance after a commit, the

Modified: lucene/dev/branches/solr2452/lucene/contrib/demo/src/test/org/apache/lucene/demo/TestDemo.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/demo/src/test/org/apache/lucene/demo/TestDemo.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/demo/src/test/org/apache/lucene/demo/TestDemo.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/demo/src/test/org/apache/lucene/demo/TestDemo.java Sun May 22 21:45:19 2011
@@ -22,16 +22,17 @@ import java.io.File;
 import java.io.PrintStream;
 
 import org.apache.lucene.util.LuceneTestCase;
+import org.apache.lucene.util._TestUtil;
 
 public class TestDemo extends LuceneTestCase {
 
-  private void testOneSearch(String query, int expectedHitCount) throws Exception {
+  private void testOneSearch(File indexPath, String query, int expectedHitCount) throws Exception {
     PrintStream outSave = System.out;
     try {
       ByteArrayOutputStream bytes = new ByteArrayOutputStream();
       PrintStream fakeSystemOut = new PrintStream(bytes);
       System.setOut(fakeSystemOut);
-      SearchFiles.main(new String[] {"-query", query});
+      SearchFiles.main(new String[] {"-query", query, "-index", indexPath.getPath()});
       fakeSystemOut.flush();
       String output = bytes.toString(); // intentionally use default encoding
       assertTrue("output=" + output, output.contains(expectedHitCount + " total matching documents"));
@@ -42,12 +43,13 @@ public class TestDemo extends LuceneTest
 
   public void testIndexSearch() throws Exception {
     File dir = getDataFile("test-files/docs");
-    IndexFiles.main(new String[] { "-create", "-docs", dir.getPath() });
-    testOneSearch("apache", 3);
-    testOneSearch("patent", 8);
-    testOneSearch("lucene", 0);
-    testOneSearch("gnu", 6);
-    testOneSearch("derivative", 8);
-    testOneSearch("license", 13);
+    File indexDir = _TestUtil.getTempDir("ContribDemoTest");
+    IndexFiles.main(new String[] { "-create", "-docs", dir.getPath(), "-index", indexDir.getPath()});
+    testOneSearch(indexDir, "apache", 3);
+    testOneSearch(indexDir, "patent", 8);
+    testOneSearch(indexDir, "lucene", 0);
+    testOneSearch(indexDir, "gnu", 6);
+    testOneSearch(indexDir, "derivative", 8);
+    testOneSearch(indexDir, "license", 13);
   }
 }

Modified: lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/Highlighter.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/Highlighter.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/Highlighter.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/Highlighter.java Sun May 22 21:45:19 2011
@@ -78,7 +78,7 @@ public class Highlighter
 	public final String getBestFragment(Analyzer analyzer, String fieldName,String text)
 		throws IOException, InvalidTokenOffsetsException
 	{
-		TokenStream tokenStream = analyzer.tokenStream(fieldName, new StringReader(text));
+		TokenStream tokenStream = analyzer.reusableTokenStream(fieldName, new StringReader(text));
 		return getBestFragment(tokenStream, text);
 	}
 
@@ -130,7 +130,7 @@ public class Highlighter
 		int maxNumFragments)
 		throws IOException, InvalidTokenOffsetsException
 	{
-		TokenStream tokenStream = analyzer.tokenStream(fieldName, new StringReader(text));
+		TokenStream tokenStream = analyzer.reusableTokenStream(fieldName, new StringReader(text));
 		return getBestFragments(tokenStream, text, maxNumFragments);
 	}
 
@@ -197,6 +197,11 @@ public class Highlighter
 	    tokenStream.reset();
 	    
 		TextFragment currentFrag =	new TextFragment(newText,newText.length(), docFrags.size());
+		
+    if (fragmentScorer instanceof QueryScorer) {
+      ((QueryScorer) fragmentScorer).setMaxDocCharsToAnalyze(maxDocCharsToAnalyze);
+    }
+    
 		TokenStream newStream = fragmentScorer.init(tokenStream);
 		if(newStream != null) {
 		  tokenStream = newStream;
@@ -350,6 +355,7 @@ public class Highlighter
 			{
 				try
 				{
+				  tokenStream.end();
 					tokenStream.close();
 				}
 				catch (Exception e)

Modified: lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/QueryScorer.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/QueryScorer.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/QueryScorer.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/QueryScorer.java Sun May 22 21:45:19 2011
@@ -54,6 +54,7 @@ public class QueryScorer implements Scor
   private IndexReader reader;
   private boolean skipInitExtractor;
   private boolean wrapToCaching = true;
+  private int maxCharsToAnalyze;
 
   /**
    * @param query Query to use for highlighting
@@ -209,7 +210,7 @@ public class QueryScorer implements Scor
   private TokenStream initExtractor(TokenStream tokenStream) throws IOException {
     WeightedSpanTermExtractor qse = defaultField == null ? new WeightedSpanTermExtractor()
         : new WeightedSpanTermExtractor(defaultField);
-
+    qse.setMaxDocCharsToAnalyze(maxCharsToAnalyze);
     qse.setExpandMultiTermQuery(expandMultiTermQuery);
     qse.setWrapIfNotCachingTokenFilter(wrapToCaching);
     if (reader == null) {
@@ -265,4 +266,8 @@ public class QueryScorer implements Scor
   public void setWrapIfNotCachingTokenFilter(boolean wrap) {
     this.wrapToCaching = wrap;
   }
+
+  public void setMaxDocCharsToAnalyze(int maxDocCharsToAnalyze) {
+    this.maxCharsToAnalyze = maxDocCharsToAnalyze;
+  }
 }

Modified: lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/TokenSources.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/TokenSources.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/TokenSources.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/TokenSources.java Sun May 22 21:45:19 2011
@@ -30,6 +30,7 @@ import org.apache.lucene.analysis.Token;
 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
 import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
+import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
 import org.apache.lucene.document.Document;
 import org.apache.lucene.index.IndexReader;
 import org.apache.lucene.index.TermFreqVector;
@@ -158,10 +159,13 @@ public class TokenSources {
 
       OffsetAttribute offsetAtt;
 
+      PositionIncrementAttribute posincAtt;
+
       StoredTokenStream(Token tokens[]) {
         this.tokens = tokens;
         termAtt = addAttribute(CharTermAttribute.class);
         offsetAtt = addAttribute(OffsetAttribute.class);
+        posincAtt = (PositionIncrementAttribute) addAttribute(PositionIncrementAttribute.class);
       }
 
       @Override
@@ -173,6 +177,10 @@ public class TokenSources {
         clearAttributes();
         termAtt.setEmpty().append(token);
         offsetAtt.setOffset(token.startOffset(), token.endOffset());
+        posincAtt
+            .setPositionIncrement(currentToken <= 1
+                || tokens[currentToken - 1].startOffset() > tokens[currentToken - 2]
+                    .startOffset() ? 1 : 0);
         return true;
       }
     }
@@ -180,7 +188,6 @@ public class TokenSources {
     BytesRef[] terms = tpv.getTerms();
     int[] freq = tpv.getTermFrequencies();
     int totalTokens = 0;
-
     for (int t = 0; t < freq.length; t++) {
       totalTokens += freq[t];
     }
@@ -189,7 +196,8 @@ public class TokenSources {
     for (int t = 0; t < freq.length; t++) {
       TermVectorOffsetInfo[] offsets = tpv.getOffsets(t);
       if (offsets == null) {
-        throw new IllegalArgumentException("Required TermVector Offset information was not found");
+        throw new IllegalArgumentException(
+            "Required TermVector Offset information was not found");
       }
 
       int[] pos = null;
@@ -205,8 +213,8 @@ public class TokenSources {
           unsortedTokens = new ArrayList<Token>();
         }
         for (int tp = 0; tp < offsets.length; tp++) {
-          Token token = new Token(terms[t].utf8ToString(), offsets[tp].getStartOffset(), offsets[tp]
-              .getEndOffset());
+          Token token = new Token(terms[t].utf8ToString(),
+              offsets[tp].getStartOffset(), offsets[tp].getEndOffset());
           unsortedTokens.add(token);
         }
       } else {
@@ -221,8 +229,8 @@ public class TokenSources {
         // tokens stored with positions - can use this to index straight into
         // sorted array
         for (int tp = 0; tp < pos.length; tp++) {
-          Token token = new Token(terms[t].utf8ToString(), offsets[tp].getStartOffset(),
-              offsets[tp].getEndOffset());
+          Token token = new Token(terms[t].utf8ToString(),
+              offsets[tp].getStartOffset(), offsets[tp].getEndOffset());
           tokensInOriginalOrder[pos[tp]] = token;
         }
       }
@@ -231,12 +239,11 @@ public class TokenSources {
     if (unsortedTokens != null) {
       tokensInOriginalOrder = unsortedTokens.toArray(new Token[unsortedTokens
           .size()]);
-      ArrayUtil.quickSort(tokensInOriginalOrder, new Comparator<Token>() {
+      ArrayUtil.mergeSort(tokensInOriginalOrder, new Comparator<Token>() {
         public int compare(Token t1, Token t2) {
-          if (t1.startOffset() == t2.startOffset())
-            return t1.endOffset() - t2.endOffset();
-          else
-            return t1.startOffset() - t2.startOffset();
+          if (t1.startOffset() == t2.startOffset()) return t1.endOffset()
+              - t2.endOffset();
+          else return t1.startOffset() - t2.startOffset();
         }
       });
     }
@@ -279,7 +286,11 @@ public class TokenSources {
   // convenience method
   public static TokenStream getTokenStream(String field, String contents,
       Analyzer analyzer) {
-    return analyzer.tokenStream(field, new StringReader(contents));
+    try {
+      return analyzer.reusableTokenStream(field, new StringReader(contents));
+    } catch (IOException ex) {
+      throw new RuntimeException(ex);
+    }
   }
 
 }

Modified: lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java Sun May 22 21:45:19 2011
@@ -56,6 +56,7 @@ public class WeightedSpanTermExtractor {
   private boolean expandMultiTermQuery;
   private boolean cachedTokenStream;
   private boolean wrapToCaching = true;
+  private int maxDocCharsToAnalyze;
 
   public WeightedSpanTermExtractor() {
   }
@@ -320,13 +321,13 @@ public class WeightedSpanTermExtractor {
 
   private AtomicReaderContext getLeafContextForField(String field) throws IOException {
     if(wrapToCaching && !cachedTokenStream && !(tokenStream instanceof CachingTokenFilter)) {
-      tokenStream = new CachingTokenFilter(tokenStream);
+      tokenStream = new CachingTokenFilter(new OffsetLimitTokenFilter(tokenStream, maxDocCharsToAnalyze));
       cachedTokenStream = true;
     }
     AtomicReaderContext context = readers.get(field);
     if (context == null) {
       MemoryIndex indexer = new MemoryIndex();
-      indexer.addField(field, tokenStream);
+      indexer.addField(field, new OffsetLimitTokenFilter(tokenStream, maxDocCharsToAnalyze));
       tokenStream.reset();
       IndexSearcher searcher = indexer.createSearcher();
       // MEM index has only atomic ctx
@@ -545,4 +546,8 @@ public class WeightedSpanTermExtractor {
   public void setWrapIfNotCachingTokenFilter(boolean wrap) {
     this.wrapToCaching = wrap;
   }
+
+  protected final void setMaxDocCharsToAnalyze(int maxDocCharsToAnalyze) {
+    this.maxDocCharsToAnalyze = maxDocCharsToAnalyze;
+  }
 }

Modified: lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterPhraseTest.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterPhraseTest.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterPhraseTest.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterPhraseTest.java Sun May 22 21:45:19 2011
@@ -58,7 +58,7 @@ public class HighlighterPhraseTest exten
     final String TEXT = "the fox jumped";
     final Directory directory = newDirectory();
     final IndexWriter indexWriter = new IndexWriter(directory,
-        newIndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(MockTokenizer.WHITESPACE, false)));
+        newIndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)));
     try {
       final Document document = new Document();
       document.add(new Field(FIELD, new TokenStreamConcurrent(),
@@ -102,7 +102,7 @@ public class HighlighterPhraseTest exten
     final String TEXT = "the fox jumped";
     final Directory directory = newDirectory();
     final IndexWriter indexWriter = new IndexWriter(directory,
-        newIndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(MockTokenizer.WHITESPACE, false)));
+        newIndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)));
     try {
       final Document document = new Document();
       document.add(new Field(FIELD, new TokenStreamConcurrent(),
@@ -172,7 +172,7 @@ public class HighlighterPhraseTest exten
     final String TEXT = "the fox did not jump";
     final Directory directory = newDirectory();
     final IndexWriter indexWriter = new IndexWriter(directory,
-        newIndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(MockTokenizer.WHITESPACE, false)));
+        newIndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)));
     try {
       final Document document = new Document();
       document.add(new Field(FIELD, new TokenStreamSparse(),
@@ -215,7 +215,7 @@ public class HighlighterPhraseTest exten
     final String TEXT = "the fox did not jump";
     final Directory directory = newDirectory();
     final IndexWriter indexWriter = new IndexWriter(directory,
-        newIndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(MockTokenizer.WHITESPACE, false)));
+        newIndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)));
     try {
       final Document document = new Document();
       document.add(new Field(FIELD, TEXT, Store.YES, Index.ANALYZED,
@@ -256,7 +256,7 @@ public class HighlighterPhraseTest exten
     final String TEXT = "the fox did not jump";
     final Directory directory = newDirectory();
     final IndexWriter indexWriter = new IndexWriter(directory,
-        newIndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(MockTokenizer.WHITESPACE, false)));
+        newIndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)));
     try {
       final Document document = new Document();
       document.add(new Field(FIELD, new TokenStreamSparse(),

Modified: lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java Sun May 22 21:45:19 2011
@@ -90,7 +90,7 @@ public class HighlighterTest extends Bas
   Directory ramDir;
   public IndexSearcher searcher = null;
   int numHighlights = 0;
-  final Analyzer analyzer = new MockAnalyzer(MockTokenizer.SIMPLE, true, MockTokenFilter.ENGLISH_STOPSET, true);
+  final Analyzer analyzer = new MockAnalyzer(random, MockTokenizer.SIMPLE, true, MockTokenFilter.ENGLISH_STOPSET, true);
   TopDocs hits;
 
   String[] texts = {
@@ -101,7 +101,7 @@ public class HighlighterTest extends Bas
       "wordx wordy wordz wordx wordy wordx worda wordb wordy wordc", "y z x y z a b", "lets is a the lets is a the lets is a the lets" };
 
   public void testQueryScorerHits() throws Exception {
-    Analyzer analyzer = new MockAnalyzer(MockTokenizer.SIMPLE, true);
+    Analyzer analyzer = new MockAnalyzer(random, MockTokenizer.SIMPLE, true);
     QueryParser qp = new QueryParser(TEST_VERSION_CURRENT, FIELD_NAME, analyzer);
     query = qp.parse("\"very long\"");
     searcher = new IndexSearcher(ramDir, true);
@@ -133,7 +133,7 @@ public class HighlighterTest extends Bas
 
     String s1 = "I call our world Flatland, not because we call it so,";
 
-    QueryParser parser = new QueryParser(TEST_VERSION_CURRENT, FIELD_NAME, new MockAnalyzer(MockTokenizer.SIMPLE, true, MockTokenFilter.ENGLISH_STOPSET, true));
+    QueryParser parser = new QueryParser(TEST_VERSION_CURRENT, FIELD_NAME, new MockAnalyzer(random, MockTokenizer.SIMPLE, true, MockTokenFilter.ENGLISH_STOPSET, true));
 
     // Verify that a query against the default field results in text being
     // highlighted
@@ -165,7 +165,7 @@ public class HighlighterTest extends Bas
    */
   private static String highlightField(Query query, String fieldName, String text)
       throws IOException, InvalidTokenOffsetsException {
-    TokenStream tokenStream = new MockAnalyzer(MockTokenizer.SIMPLE, true, MockTokenFilter.ENGLISH_STOPSET, true).tokenStream(fieldName, new StringReader(text));
+    TokenStream tokenStream = new MockAnalyzer(random, MockTokenizer.SIMPLE, true, MockTokenFilter.ENGLISH_STOPSET, true).tokenStream(fieldName, new StringReader(text));
     // Assuming "<B>", "</B>" used to highlight
     SimpleHTMLFormatter formatter = new SimpleHTMLFormatter();
     QueryScorer scorer = new QueryScorer(query, fieldName, FIELD_NAME);
@@ -210,7 +210,7 @@ public class HighlighterTest extends Bas
     String f2c = f2 + ":";
     String q = "(" + f1c + ph1 + " OR " + f2c + ph1 + ") AND (" + f1c + ph2
         + " OR " + f2c + ph2 + ")";
-    Analyzer analyzer = new MockAnalyzer(MockTokenizer.WHITESPACE, false);
+    Analyzer analyzer = new MockAnalyzer(random, MockTokenizer.WHITESPACE, false);
     QueryParser qp = new QueryParser(TEST_VERSION_CURRENT, f1, analyzer);
     Query query = qp.parse(q);
 
@@ -1093,6 +1093,10 @@ public class HighlighterTest extends Bas
   }
 
   public void testMaxSizeHighlight() throws Exception {
+    final MockAnalyzer analyzer = new MockAnalyzer(random, MockTokenizer.SIMPLE, true, MockTokenFilter.ENGLISH_STOPSET, true);
+    // we disable MockTokenizer checks because we will forcefully limit the 
+    // tokenstream and call end() before incrementToken() returns false.
+    analyzer.setEnableChecks(false);
     TestHighlightRunner helper = new TestHighlightRunner() {
 
       @Override
@@ -1122,7 +1126,10 @@ public class HighlighterTest extends Bas
       public void run() throws Exception {
         String goodWord = "goodtoken";
         CharacterRunAutomaton stopWords = new CharacterRunAutomaton(BasicAutomata.makeString("stoppedtoken"));
-
+        // we disable MockTokenizer checks because we will forcefully limit the 
+        // tokenstream and call end() before incrementToken() returns false.
+        final MockAnalyzer analyzer = new MockAnalyzer(random, MockTokenizer.SIMPLE, true, stopWords, true);
+        analyzer.setEnableChecks(false);
         TermQuery query = new TermQuery(new Term("data", goodWord));
 
         String match;
@@ -1134,13 +1141,13 @@ public class HighlighterTest extends Bas
           sb.append("stoppedtoken");
         }
         SimpleHTMLFormatter fm = new SimpleHTMLFormatter();
-        Highlighter hg = getHighlighter(query, "data", new MockAnalyzer(MockTokenizer.SIMPLE, true, stopWords, true).tokenStream(
+        Highlighter hg = getHighlighter(query, "data", analyzer.tokenStream(
             "data", new StringReader(sb.toString())), fm);// new Highlighter(fm,
         // new
         // QueryTermScorer(query));
         hg.setTextFragmenter(new NullFragmenter());
         hg.setMaxDocCharsToAnalyze(100);
-        match = hg.getBestFragment(new MockAnalyzer(MockTokenizer.SIMPLE, true, stopWords, true), "data", sb.toString());
+        match = hg.getBestFragment(analyzer, "data", sb.toString());
         assertTrue("Matched text should be no more than 100 chars in length ", match.length() < hg
             .getMaxDocCharsToAnalyze());
 
@@ -1151,7 +1158,7 @@ public class HighlighterTest extends Bas
         // + whitespace)
         sb.append(" ");
         sb.append(goodWord);
-        match = hg.getBestFragment(new MockAnalyzer(MockTokenizer.SIMPLE, true, stopWords, true), "data", sb.toString());
+        match = hg.getBestFragment(analyzer, "data", sb.toString());
         assertTrue("Matched text should be no more than 100 chars in length ", match.length() < hg
             .getMaxDocCharsToAnalyze());
       }
@@ -1170,10 +1177,10 @@ public class HighlighterTest extends Bas
 
         String text = "this is a text with searchterm in it";
         SimpleHTMLFormatter fm = new SimpleHTMLFormatter();
-        Highlighter hg = getHighlighter(query, "text", new MockAnalyzer(MockTokenizer.SIMPLE, true, stopWords, true).tokenStream("text", new StringReader(text)), fm);
+        Highlighter hg = getHighlighter(query, "text", new MockAnalyzer(random, MockTokenizer.SIMPLE, true, stopWords, true).tokenStream("text", new StringReader(text)), fm);
         hg.setTextFragmenter(new NullFragmenter());
         hg.setMaxDocCharsToAnalyze(36);
-        String match = hg.getBestFragment(new MockAnalyzer(MockTokenizer.SIMPLE, true, stopWords, true), "text", text);
+        String match = hg.getBestFragment(new MockAnalyzer(random, MockTokenizer.SIMPLE, true, stopWords, true), "text", text);
         assertTrue(
             "Matched text should contain remainder of text after highlighted query ",
             match.endsWith("in it"));
@@ -1191,7 +1198,7 @@ public class HighlighterTest extends Bas
         // test to show how rewritten query can still be used
         if (searcher != null) searcher.close();
         searcher = new IndexSearcher(ramDir, true);
-        Analyzer analyzer = new MockAnalyzer(MockTokenizer.SIMPLE, true, MockTokenFilter.ENGLISH_STOPSET, true);
+        Analyzer analyzer = new MockAnalyzer(random, MockTokenizer.SIMPLE, true, MockTokenFilter.ENGLISH_STOPSET, true);
 
         QueryParser parser = new QueryParser(TEST_VERSION_CURRENT, FIELD_NAME, analyzer);
         Query query = parser.parse("JF? or Kenned*");
@@ -1446,64 +1453,64 @@ public class HighlighterTest extends Bas
         Highlighter highlighter;
         String result;
 
-        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(MockTokenizer.WHITESPACE, false)).parse("foo");
+        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)).parse("foo");
         highlighter = getHighlighter(query, "text", getTS2(), HighlighterTest.this);
         result = highlighter.getBestFragments(getTS2(), s, 3, "...");
         assertEquals("Hi-Speed10 <B>foo</B>", result);
 
-        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(MockTokenizer.WHITESPACE, false)).parse("10");
+        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)).parse("10");
         highlighter = getHighlighter(query, "text", getTS2(), HighlighterTest.this);
         result = highlighter.getBestFragments(getTS2(), s, 3, "...");
         assertEquals("Hi-Speed<B>10</B> foo", result);
 
-        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(MockTokenizer.WHITESPACE, false)).parse("hi");
+        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)).parse("hi");
         highlighter = getHighlighter(query, "text", getTS2(), HighlighterTest.this);
         result = highlighter.getBestFragments(getTS2(), s, 3, "...");
         assertEquals("<B>Hi</B>-Speed10 foo", result);
 
-        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(MockTokenizer.WHITESPACE, false)).parse("speed");
+        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)).parse("speed");
         highlighter = getHighlighter(query, "text", getTS2(), HighlighterTest.this);
         result = highlighter.getBestFragments(getTS2(), s, 3, "...");
         assertEquals("Hi-<B>Speed</B>10 foo", result);
 
-        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(MockTokenizer.WHITESPACE, false)).parse("hispeed");
+        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)).parse("hispeed");
         highlighter = getHighlighter(query, "text", getTS2(), HighlighterTest.this);
         result = highlighter.getBestFragments(getTS2(), s, 3, "...");
         assertEquals("<B>Hi-Speed</B>10 foo", result);
 
-        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(MockTokenizer.WHITESPACE, false)).parse("hi speed");
+        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)).parse("hi speed");
         highlighter = getHighlighter(query, "text", getTS2(), HighlighterTest.this);
         result = highlighter.getBestFragments(getTS2(), s, 3, "...");
         assertEquals("<B>Hi-Speed</B>10 foo", result);
 
         // ///////////////// same tests, just put the bigger overlapping token
         // first
-        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(MockTokenizer.WHITESPACE, false)).parse("foo");
+        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)).parse("foo");
         highlighter = getHighlighter(query, "text", getTS2a(), HighlighterTest.this);
         result = highlighter.getBestFragments(getTS2a(), s, 3, "...");
         assertEquals("Hi-Speed10 <B>foo</B>", result);
 
-        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(MockTokenizer.WHITESPACE, false)).parse("10");
+        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)).parse("10");
         highlighter = getHighlighter(query, "text", getTS2a(), HighlighterTest.this);
         result = highlighter.getBestFragments(getTS2a(), s, 3, "...");
         assertEquals("Hi-Speed<B>10</B> foo", result);
 
-        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(MockTokenizer.WHITESPACE, false)).parse("hi");
+        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)).parse("hi");
         highlighter = getHighlighter(query, "text", getTS2a(), HighlighterTest.this);
         result = highlighter.getBestFragments(getTS2a(), s, 3, "...");
         assertEquals("<B>Hi</B>-Speed10 foo", result);
 
-        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(MockTokenizer.WHITESPACE, false)).parse("speed");
+        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)).parse("speed");
         highlighter = getHighlighter(query, "text", getTS2a(), HighlighterTest.this);
         result = highlighter.getBestFragments(getTS2a(), s, 3, "...");
         assertEquals("Hi-<B>Speed</B>10 foo", result);
 
-        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(MockTokenizer.WHITESPACE, false)).parse("hispeed");
+        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)).parse("hispeed");
         highlighter = getHighlighter(query, "text", getTS2a(), HighlighterTest.this);
         result = highlighter.getBestFragments(getTS2a(), s, 3, "...");
         assertEquals("<B>Hi-Speed</B>10 foo", result);
 
-        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(MockTokenizer.WHITESPACE, false)).parse("hi speed");
+        query = new QueryParser(TEST_VERSION_CURRENT, "text", new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)).parse("hi speed");
         highlighter = getHighlighter(query, "text", getTS2a(), HighlighterTest.this);
         result = highlighter.getBestFragments(getTS2a(), s, 3, "...");
         assertEquals("<B>Hi-Speed</B>10 foo", result);
@@ -1514,7 +1521,7 @@ public class HighlighterTest extends Bas
   }
   
   private Directory dir;
-  private Analyzer a = new MockAnalyzer(MockTokenizer.WHITESPACE, false);
+  private Analyzer a = new MockAnalyzer(random, MockTokenizer.WHITESPACE, false);
   
   public void testWeightedTermsWithDeletes() throws IOException, ParseException, InvalidTokenOffsetsException {
     makeIndex();
@@ -1529,7 +1536,7 @@ public class HighlighterTest extends Bas
   }
   
   private void makeIndex() throws IOException {
-    IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(MockTokenizer.WHITESPACE, false)));
+    IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)));
     writer.addDocument( doc( "t_text1", "random words for highlighting tests del" ) );
     writer.addDocument( doc( "t_text1", "more random words for second field del" ) );
     writer.addDocument( doc( "t_text1", "random words for highlighting tests del" ) );
@@ -1539,7 +1546,7 @@ public class HighlighterTest extends Bas
   }
   
   private void deleteDocument() throws IOException {
-    IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(MockTokenizer.WHITESPACE, false)).setOpenMode(OpenMode.APPEND));
+    IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(random, MockTokenizer.WHITESPACE, false)).setOpenMode(OpenMode.APPEND));
     writer.deleteDocuments( new Term( "t_text1", "del" ) );
     // To see negative idf, keep comment the following line
     //writer.optimize();
@@ -1644,7 +1651,7 @@ public class HighlighterTest extends Bas
     dir = newDirectory();
     ramDir = newDirectory();
     IndexWriter writer = new IndexWriter(ramDir, newIndexWriterConfig(
-        TEST_VERSION_CURRENT, new MockAnalyzer(MockTokenizer.SIMPLE, true, MockTokenFilter.ENGLISH_STOPSET, true)));
+        TEST_VERSION_CURRENT, new MockAnalyzer(random, MockTokenizer.SIMPLE, true, MockTokenFilter.ENGLISH_STOPSET, true)));
     for (String text : texts) {
       addDoc(writer, text);
     }
@@ -1726,6 +1733,11 @@ final class SynonymAnalyzer extends Anal
     stream.addAttribute(CharTermAttribute.class);
     stream.addAttribute(PositionIncrementAttribute.class);
     stream.addAttribute(OffsetAttribute.class);
+    try {
+      stream.reset();
+    } catch (IOException e) {
+      throw new RuntimeException(e);
+    }
     return new SynonymTokenizer(stream, synonyms);
   }
 }

Modified: lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/test/org/apache/lucene/search/highlight/TokenSourcesTest.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/test/org/apache/lucene/search/highlight/TokenSourcesTest.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/test/org/apache/lucene/search/highlight/TokenSourcesTest.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/test/org/apache/lucene/search/highlight/TokenSourcesTest.java Sun May 22 21:45:19 2011
@@ -36,7 +36,10 @@ import org.apache.lucene.index.Term;
 import org.apache.lucene.index.TermPositionVector;
 import org.apache.lucene.search.DisjunctionMaxQuery;
 import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.Query;
 import org.apache.lucene.search.TopDocs;
+import org.apache.lucene.search.spans.SpanNearQuery;
+import org.apache.lucene.search.spans.SpanQuery;
 import org.apache.lucene.search.spans.SpanTermQuery;
 import org.apache.lucene.store.Directory;
 import org.apache.lucene.store.LockObtainFailedException;
@@ -86,12 +89,12 @@ public class TokenSourcesTest extends Lu
     public void reset() {
       this.i = -1;
       this.tokens = new Token[] {
-          new Token(new char[] { 't', 'h', 'e' }, 0, 3, 0, 3),
-          new Token(new char[] { '{', 'f', 'o', 'x', '}' }, 0, 5, 0, 7),
-          new Token(new char[] { 'f', 'o', 'x' }, 0, 3, 4, 7),
-          new Token(new char[] { 'd', 'i', 'd' }, 0, 3, 8, 11),
-          new Token(new char[] { 'n', 'o', 't' }, 0, 3, 12, 15),
-          new Token(new char[] { 'j', 'u', 'm', 'p' }, 0, 4, 16, 20) };
+          new Token(new char[] {'t', 'h', 'e'}, 0, 3, 0, 3),
+          new Token(new char[] {'{', 'f', 'o', 'x', '}'}, 0, 5, 0, 7),
+          new Token(new char[] {'f', 'o', 'x'}, 0, 3, 4, 7),
+          new Token(new char[] {'d', 'i', 'd'}, 0, 3, 8, 11),
+          new Token(new char[] {'n', 'o', 't'}, 0, 3, 12, 15),
+          new Token(new char[] {'j', 'u', 'm', 'p'}, 0, 4, 16, 20)};
       this.tokens[1].setPositionIncrement(0);
     }
   }
@@ -188,4 +191,97 @@ public class TokenSourcesTest extends Lu
     }
   }
 
+  public void testOverlapWithOffsetExactPhrase() throws CorruptIndexException,
+      LockObtainFailedException, IOException, InvalidTokenOffsetsException {
+    final String TEXT = "the fox did not jump";
+    final Directory directory = newDirectory();
+    final IndexWriter indexWriter = new IndexWriter(directory,
+        newIndexWriterConfig(TEST_VERSION_CURRENT, new OverlapAnalyzer()));
+    try {
+      final Document document = new Document();
+      document.add(new Field(FIELD, new TokenStreamOverlap(),
+          TermVector.WITH_OFFSETS));
+      indexWriter.addDocument(document);
+    } finally {
+      indexWriter.close();
+    }
+    final IndexReader indexReader = IndexReader.open(directory, true);
+    try {
+      assertEquals(1, indexReader.numDocs());
+      final IndexSearcher indexSearcher = newSearcher(indexReader);
+      try {
+        // final DisjunctionMaxQuery query = new DisjunctionMaxQuery(1);
+        // query.add(new SpanTermQuery(new Term(FIELD, "{fox}")));
+        // query.add(new SpanTermQuery(new Term(FIELD, "fox")));
+        final Query phraseQuery = new SpanNearQuery(new SpanQuery[] {
+            new SpanTermQuery(new Term(FIELD, "the")),
+            new SpanTermQuery(new Term(FIELD, "fox"))}, 0, true);
+
+        TopDocs hits = indexSearcher.search(phraseQuery, 1);
+        assertEquals(1, hits.totalHits);
+        final Highlighter highlighter = new Highlighter(
+            new SimpleHTMLFormatter(), new SimpleHTMLEncoder(),
+            new QueryScorer(phraseQuery));
+        final TokenStream tokenStream = TokenSources
+            .getTokenStream(
+                (TermPositionVector) indexReader.getTermFreqVector(0, FIELD),
+                false);
+        assertEquals("<B>the fox</B> did not jump",
+            highlighter.getBestFragment(tokenStream, TEXT));
+      } finally {
+        indexSearcher.close();
+      }
+    } finally {
+      indexReader.close();
+      directory.close();
+    }
+  }
+
+  public void testOverlapWithPositionsAndOffsetExactPhrase()
+      throws CorruptIndexException, LockObtainFailedException, IOException,
+      InvalidTokenOffsetsException {
+    final String TEXT = "the fox did not jump";
+    final Directory directory = newDirectory();
+    final IndexWriter indexWriter = new IndexWriter(directory,
+        newIndexWriterConfig(TEST_VERSION_CURRENT, new OverlapAnalyzer()));
+    try {
+      final Document document = new Document();
+      document.add(new Field(FIELD, new TokenStreamOverlap(),
+          TermVector.WITH_POSITIONS_OFFSETS));
+      indexWriter.addDocument(document);
+    } finally {
+      indexWriter.close();
+    }
+    final IndexReader indexReader = IndexReader.open(directory, true);
+    try {
+      assertEquals(1, indexReader.numDocs());
+      final IndexSearcher indexSearcher = newSearcher(indexReader);
+      try {
+        // final DisjunctionMaxQuery query = new DisjunctionMaxQuery(1);
+        // query.add(new SpanTermQuery(new Term(FIELD, "the")));
+        // query.add(new SpanTermQuery(new Term(FIELD, "fox")));
+        final Query phraseQuery = new SpanNearQuery(new SpanQuery[] {
+            new SpanTermQuery(new Term(FIELD, "the")),
+            new SpanTermQuery(new Term(FIELD, "fox"))}, 0, true);
+
+        TopDocs hits = indexSearcher.search(phraseQuery, 1);
+        assertEquals(1, hits.totalHits);
+        final Highlighter highlighter = new Highlighter(
+            new SimpleHTMLFormatter(), new SimpleHTMLEncoder(),
+            new QueryScorer(phraseQuery));
+        final TokenStream tokenStream = TokenSources
+            .getTokenStream(
+                (TermPositionVector) indexReader.getTermFreqVector(0, FIELD),
+                false);
+        assertEquals("<B>the fox</B> did not jump",
+            highlighter.getBestFragment(tokenStream, TEXT));
+      } finally {
+        indexSearcher.close();
+      }
+    } finally {
+      indexReader.close();
+      directory.close();
+    }
+  }
+
 }

Modified: lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/test/org/apache/lucene/search/vectorhighlight/AbstractTestCase.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/test/org/apache/lucene/search/vectorhighlight/AbstractTestCase.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/test/org/apache/lucene/search/vectorhighlight/AbstractTestCase.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/highlighter/src/test/org/apache/lucene/search/vectorhighlight/AbstractTestCase.java Sun May 22 21:45:19 2011
@@ -87,9 +87,9 @@ public abstract class AbstractTestCase e
   @Override
   public void setUp() throws Exception {
     super.setUp();
-    analyzerW = new MockAnalyzer(MockTokenizer.WHITESPACE, false);
+    analyzerW = new MockAnalyzer(random, MockTokenizer.WHITESPACE, false);
     analyzerB = new BigramAnalyzer();
-    analyzerK = new MockAnalyzer(MockTokenizer.KEYWORD, false);
+    analyzerK = new MockAnalyzer(random, MockTokenizer.KEYWORD, false);
     paW = new QueryParser(TEST_VERSION_CURRENT,  F, analyzerW );
     paB = new QueryParser(TEST_VERSION_CURRENT,  F, analyzerB );
     dir = newDirectory();

Modified: lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/java/org/apache/lucene/store/instantiated/InstantiatedIndexWriter.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/java/org/apache/lucene/store/instantiated/InstantiatedIndexWriter.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/java/org/apache/lucene/store/instantiated/InstantiatedIndexWriter.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/java/org/apache/lucene/store/instantiated/InstantiatedIndexWriter.java Sun May 22 21:45:19 2011
@@ -532,7 +532,7 @@ public class InstantiatedIndexWriter imp
           if (field.tokenStreamValue() != null) {
             tokenStream = field.tokenStreamValue();
           } else {
-            tokenStream = analyzer.tokenStream(field.name(), new StringReader(field.stringValue()));
+            tokenStream = analyzer.reusableTokenStream(field.name(), new StringReader(field.stringValue()));
           }
 
           // reset the TokenStream to the first token          

Modified: lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/test/org/apache/lucene/store/instantiated/TestEmptyIndex.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/test/org/apache/lucene/store/instantiated/TestEmptyIndex.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/test/org/apache/lucene/store/instantiated/TestEmptyIndex.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/test/org/apache/lucene/store/instantiated/TestEmptyIndex.java Sun May 22 21:45:19 2011
@@ -59,7 +59,7 @@ public class TestEmptyIndex extends Luce
 
     // make sure a Directory acts the same
     Directory d = newDirectory();
-    new IndexWriter(d, newIndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer())).close();
+    new IndexWriter(d, newIndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(random))).close();
     r = IndexReader.open(d, false);
     testNorms(r);
     r.close();
@@ -84,7 +84,7 @@ public class TestEmptyIndex extends Luce
 
     // make sure a Directory acts the same
     Directory d = newDirectory();
-    new IndexWriter(d, newIndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer())).close();
+    new IndexWriter(d, newIndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(random))).close();
     r = IndexReader.open(d, false);
     termsEnumTest(r);
     r.close();

Modified: lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/test/org/apache/lucene/store/instantiated/TestIndicesEquals.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/test/org/apache/lucene/store/instantiated/TestIndicesEquals.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/test/org/apache/lucene/store/instantiated/TestIndicesEquals.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/test/org/apache/lucene/store/instantiated/TestIndicesEquals.java Sun May 22 21:45:19 2011
@@ -21,6 +21,7 @@ import java.util.Arrays;
 import java.util.Comparator;
 import java.util.Iterator;
 import java.util.List;
+import java.util.Random;
 
 import org.apache.lucene.analysis.Token;
 import org.apache.lucene.analysis.TokenStream;
@@ -65,7 +66,7 @@ public class TestIndicesEquals extends L
 
     // create dir data
     IndexWriter indexWriter = new IndexWriter(dir, newIndexWriterConfig(
-        TEST_VERSION_CURRENT, new MockAnalyzer()).setMergePolicy(newInOrderLogMergePolicy()));
+        TEST_VERSION_CURRENT, new MockAnalyzer(random)).setMergePolicy(newLogMergePolicy()));
     
     for (int i = 0; i < 20; i++) {
       Document document = new Document();
@@ -88,10 +89,13 @@ public class TestIndicesEquals extends L
 
     Directory dir = newDirectory();
     InstantiatedIndex ii = new InstantiatedIndex();
-
+    
+    // we need to pass the "same" random to both, so they surely index the same payload data.
+    long seed = random.nextLong();
+    
     // create dir data
     IndexWriter indexWriter = new IndexWriter(dir, newIndexWriterConfig(
-                                                                        TEST_VERSION_CURRENT, new MockAnalyzer()).setMergePolicy(newInOrderLogMergePolicy()));
+                                                                        TEST_VERSION_CURRENT, new MockAnalyzer(new Random(seed))).setMergePolicy(newLogMergePolicy()));
     indexWriter.setInfoStream(VERBOSE ? System.out : null);
     if (VERBOSE) {
       System.out.println("TEST: make test index");
@@ -104,7 +108,7 @@ public class TestIndicesEquals extends L
     indexWriter.close();
 
     // test ii writer
-    InstantiatedIndexWriter instantiatedIndexWriter = ii.indexWriterFactory(new MockAnalyzer(), true);
+    InstantiatedIndexWriter instantiatedIndexWriter = ii.indexWriterFactory(new MockAnalyzer(new Random(seed)), true);
     for (int i = 0; i < 500; i++) {
       Document document = new Document();
       assembleDocument(document, i);

Modified: lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/test/org/apache/lucene/store/instantiated/TestRealTime.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/test/org/apache/lucene/store/instantiated/TestRealTime.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/test/org/apache/lucene/store/instantiated/TestRealTime.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/test/org/apache/lucene/store/instantiated/TestRealTime.java Sun May 22 21:45:19 2011
@@ -36,7 +36,7 @@ public class TestRealTime extends Lucene
 
     InstantiatedIndex index = new InstantiatedIndex();
     InstantiatedIndexReader reader = new InstantiatedIndexReader(index);
-    IndexSearcher searcher = newSearcher(reader);
+    IndexSearcher searcher = newSearcher(reader, false);
     InstantiatedIndexWriter writer = new InstantiatedIndexWriter(index);
 
     Document doc;

Modified: lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/test/org/apache/lucene/store/instantiated/TestUnoptimizedReaderOnConstructor.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/test/org/apache/lucene/store/instantiated/TestUnoptimizedReaderOnConstructor.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/test/org/apache/lucene/store/instantiated/TestUnoptimizedReaderOnConstructor.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/instantiated/src/test/org/apache/lucene/store/instantiated/TestUnoptimizedReaderOnConstructor.java Sun May 22 21:45:19 2011
@@ -34,17 +34,17 @@ public class TestUnoptimizedReaderOnCons
 
   public void test() throws Exception {
     Directory dir = newDirectory();
-    IndexWriter iw = new IndexWriter(dir, new IndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer()));
+    IndexWriter iw = new IndexWriter(dir, new IndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(random)));
     addDocument(iw, "Hello, world!");
     addDocument(iw, "All work and no play makes jack a dull boy");
     iw.close();
 
-    iw = new IndexWriter(dir, new IndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer()).setOpenMode(OpenMode.APPEND));
+    iw = new IndexWriter(dir, new IndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(random)).setOpenMode(OpenMode.APPEND));
     addDocument(iw, "Hello, tellus!");
     addDocument(iw, "All work and no play makes danny a dull boy");
     iw.close();
 
-    iw = new IndexWriter(dir, new IndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer()).setOpenMode(OpenMode.APPEND));
+    iw = new IndexWriter(dir, new IndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(random)).setOpenMode(OpenMode.APPEND));
     addDocument(iw, "Hello, earth!");
     addDocument(iw, "All work and no play makes wendy a dull girl");
     iw.close();

Modified: lucene/dev/branches/solr2452/lucene/contrib/lucli/src/java/lucli/LuceneMethods.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/lucli/src/java/lucli/LuceneMethods.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/lucli/src/java/lucli/LuceneMethods.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/lucli/src/java/lucli/LuceneMethods.java Sun May 22 21:45:19 2011
@@ -305,11 +305,12 @@ class LuceneMethods {
 
           int position = 0;
           // Tokenize field and add to postingTable
-          TokenStream stream = analyzer.tokenStream(fieldName, reader);
+          TokenStream stream = analyzer.reusableTokenStream(fieldName, reader);
           CharTermAttribute termAtt = stream.addAttribute(CharTermAttribute.class);
           PositionIncrementAttribute posIncrAtt = stream.addAttribute(PositionIncrementAttribute.class);
           
           try {
+            stream.reset();
             while (stream.incrementToken()) {
               position += (posIncrAtt.getPositionIncrement() - 1);
               position++;
@@ -323,6 +324,7 @@ class LuceneMethods {
               }
               if (position > maxFieldLength) break;
             }
+            stream.end();
           } finally {
             stream.close();
           }

Modified: lucene/dev/branches/solr2452/lucene/contrib/memory/src/java/org/apache/lucene/index/memory/MemoryIndex.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/memory/src/java/org/apache/lucene/index/memory/MemoryIndex.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/memory/src/java/org/apache/lucene/index/memory/MemoryIndex.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/memory/src/java/org/apache/lucene/index/memory/MemoryIndex.java Sun May 22 21:45:19 2011
@@ -261,8 +261,12 @@ public class MemoryIndex {
     if (analyzer == null)
       throw new IllegalArgumentException("analyzer must not be null");
     
-    TokenStream stream = analyzer.tokenStream(fieldName, 
-    		new StringReader(text));
+    TokenStream stream;
+    try {
+      stream = analyzer.reusableTokenStream(fieldName, new StringReader(text));
+    } catch (IOException ex) {
+      throw new RuntimeException(ex);
+    }
 
     addField(fieldName, stream);
   }

Modified: lucene/dev/branches/solr2452/lucene/contrib/memory/src/test/org/apache/lucene/index/memory/MemoryIndexTest.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/memory/src/test/org/apache/lucene/index/memory/MemoryIndexTest.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/memory/src/test/org/apache/lucene/index/memory/MemoryIndexTest.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/memory/src/test/org/apache/lucene/index/memory/MemoryIndexTest.java Sun May 22 21:45:19 2011
@@ -143,9 +143,9 @@ public class MemoryIndexTest extends Bas
    */
   private Analyzer randomAnalyzer() {
     switch(random.nextInt(3)) {
-      case 0: return new MockAnalyzer(MockTokenizer.SIMPLE, true);
-      case 1: return new MockAnalyzer(MockTokenizer.SIMPLE, true, MockTokenFilter.ENGLISH_STOPSET, true);
-      default: return new MockAnalyzer(MockTokenizer.WHITESPACE, false);
+      case 0: return new MockAnalyzer(random, MockTokenizer.SIMPLE, true);
+      case 1: return new MockAnalyzer(random, MockTokenizer.SIMPLE, true, MockTokenFilter.ENGLISH_STOPSET, true);
+      default: return new MockAnalyzer(random, MockTokenizer.WHITESPACE, false);
     }
   }
   

Modified: lucene/dev/branches/solr2452/lucene/contrib/misc/src/java/org/apache/lucene/index/BalancedSegmentMergePolicy.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/misc/src/java/org/apache/lucene/index/BalancedSegmentMergePolicy.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/misc/src/java/org/apache/lucene/index/BalancedSegmentMergePolicy.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/misc/src/java/org/apache/lucene/index/BalancedSegmentMergePolicy.java Sun May 22 21:45:19 2011
@@ -19,6 +19,7 @@ package org.apache.lucene.index;
 
 
 import java.io.IOException;
+import java.util.Collections;
 import java.util.Set;
 
 /**
@@ -135,7 +136,7 @@ public class BalancedSegmentMergePolicy 
           if (last > 1 || !isOptimized(infos.info(0))) {
 
             spec = new MergeSpecification();
-            spec.add(new OneMerge(infos.range(0, last)));
+            spec.add(new OneMerge(infos.asList().subList(0, last)));
           }
         } else if (last > maxNumSegments) {
 
@@ -192,7 +193,7 @@ public class BalancedSegmentMergePolicy 
       prev = backLink[i][prev];
       int mergeStart = i + prev;
       if((mergeEnd - mergeStart) > 1) {
-        spec.add(new OneMerge(infos.range(mergeStart, mergeEnd)));
+        spec.add(new OneMerge(infos.asList().subList(mergeStart, mergeEnd)));
       } else {
         if(partialExpunge) {
           SegmentInfo info = infos.info(mergeStart);
@@ -208,7 +209,7 @@ public class BalancedSegmentMergePolicy 
     
     if(partialExpunge && maxDelCount > 0) {
       // expunge deletes
-      spec.add(new OneMerge(infos.range(expungeCandidate, expungeCandidate + 1)));
+      spec.add(new OneMerge(Collections.singletonList(infos.info(expungeCandidate))));
     }
     
     return spec;
@@ -250,7 +251,10 @@ public class BalancedSegmentMergePolicy 
     MergeSpecification spec = null;
     
     if(numLargeSegs < numSegs) {
-      SegmentInfos smallSegments = infos.range(numLargeSegs, numSegs);
+      // hack to create a shallow sub-range as SegmentInfos instance,
+      // it does not clone all metadata, but LogMerge does not need it
+      final SegmentInfos smallSegments = new SegmentInfos();
+      smallSegments.rollbackSegmentInfos(infos.asList().subList(numLargeSegs, numSegs));
       spec = super.findMergesToExpungeDeletes(smallSegments);
     }
     
@@ -258,7 +262,7 @@ public class BalancedSegmentMergePolicy 
     for(int i = 0; i < numLargeSegs; i++) {
       SegmentInfo info = infos.info(i);
       if(info.hasDeletions()) {
-        spec.add(new OneMerge(infos.range(i, i + 1)));
+        spec.add(new OneMerge(Collections.singletonList(infos.info(i))));
       }
     }
     return spec;
@@ -296,7 +300,7 @@ public class BalancedSegmentMergePolicy 
       if(totalSmallSegSize < targetSegSize * 2) {
         MergeSpecification spec = findBalancedMerges(infos, numLargeSegs, (numLargeSegs - 1), _partialExpunge);
         if(spec == null) spec = new MergeSpecification(); // should not happen
-        spec.add(new OneMerge(infos.range(numLargeSegs, numSegs)));
+        spec.add(new OneMerge(infos.asList().subList(numLargeSegs, numSegs)));
         return spec;
       } else {
         return findBalancedMerges(infos, numSegs, numLargeSegs, _partialExpunge);
@@ -311,11 +315,13 @@ public class BalancedSegmentMergePolicy 
         if(size(info) < sizeThreshold) break;
         startSeg++;
       }
-      spec.add(new OneMerge(infos.range(startSeg, numSegs)));
+      spec.add(new OneMerge(infos.asList().subList(startSeg, numSegs)));
       return spec;
     } else {
-      // apply the log merge policy to small segments.
-      SegmentInfos smallSegments = infos.range(numLargeSegs, numSegs);
+      // hack to create a shallow sub-range as SegmentInfos instance,
+      // it does not clone all metadata, but LogMerge does not need it
+      final SegmentInfos smallSegments = new SegmentInfos();
+      smallSegments.rollbackSegmentInfos(infos.asList().subList(numLargeSegs, numSegs));
       MergeSpecification spec = super.findMerges(smallSegments);
       
       if(_partialExpunge) {
@@ -342,7 +348,7 @@ public class BalancedSegmentMergePolicy 
       }
     }
     if (maxDelCount > 0) {
-      return new OneMerge(infos.range(expungeCandidate, expungeCandidate + 1));
+      return new OneMerge(Collections.singletonList(infos.info(expungeCandidate)));
     }
     return null;
   }

Modified: lucene/dev/branches/solr2452/lucene/contrib/misc/src/test/org/apache/lucene/index/TestFieldNormModifier.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/misc/src/test/org/apache/lucene/index/TestFieldNormModifier.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/misc/src/test/org/apache/lucene/index/TestFieldNormModifier.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/misc/src/test/org/apache/lucene/index/TestFieldNormModifier.java Sun May 22 21:45:19 2011
@@ -61,7 +61,7 @@ public class TestFieldNormModifier exten
     super.setUp();
     store = newDirectory();
     IndexWriter writer = new IndexWriter(store, newIndexWriterConfig(
-                                                                     TEST_VERSION_CURRENT, new MockAnalyzer()).setMergePolicy(newInOrderLogMergePolicy()));
+                                                                     TEST_VERSION_CURRENT, new MockAnalyzer(random)).setMergePolicy(newLogMergePolicy()));
     
     for (int i = 0; i < NUM_DOCS; i++) {
       Document d = new Document();

Modified: lucene/dev/branches/solr2452/lucene/contrib/misc/src/test/org/apache/lucene/index/TestIndexSplitter.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/solr2452/lucene/contrib/misc/src/test/org/apache/lucene/index/TestIndexSplitter.java?rev=1126234&r1=1126233&r2=1126234&view=diff
==============================================================================
--- lucene/dev/branches/solr2452/lucene/contrib/misc/src/test/org/apache/lucene/index/TestIndexSplitter.java (original)
+++ lucene/dev/branches/solr2452/lucene/contrib/misc/src/test/org/apache/lucene/index/TestIndexSplitter.java Sun May 22 21:45:19 2011
@@ -39,7 +39,7 @@ public class TestIndexSplitter extends L
     mergePolicy.setNoCFSRatio(1);
     IndexWriter iw = new IndexWriter(
         fsDir,
-        new IndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer()).
+        new IndexWriterConfig(TEST_VERSION_CURRENT, new MockAnalyzer(random)).
             setOpenMode(OpenMode.CREATE).
             setMergePolicy(mergePolicy)
     );