You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Uwe Schindler <uw...@thetaphi.de> on 2011/11/28 08:51:28 UTC

RE: svn commit: r1206916 - in /lucene/dev/branches/branch_3x: ./ lucene/ lucene/backwards/src/ lucene/backwards/src/test-framework/ lucene/backwards/src/test/ solr/ solr/core/src/java/org/apache/solr/analysis/ solr/core/src/java/org/apache/solr/schema/ so

Hi Erick,

I had to fix the Java 5 errors in the 3x branch commit:
In Java 5 interfaces do not support @Override (which is in my opinion correct and is horrible that it was introduced in Java 6: @Override on interfaces is wrong, as nothing is overridden), but for stupidity JDK6's compiler has a well-known bug and does not detect this syntax violation with -source 1.5).

I recommend having a JDK5 installed to do the final test run before committing.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: erick@apache.org [mailto:erick@apache.org]
> Sent: Monday, November 28, 2011 12:23 AM
> To: commits@lucene.apache.org
> Subject: svn commit: r1206916 - in /lucene/dev/branches/branch_3x: ./ lucene/
> lucene/backwards/src/ lucene/backwards/src/test-framework/
> lucene/backwards/src/test/ solr/ solr/core/src/java/org/apache/solr/analysis/
> solr/core/src/java/org/apache/solr/schema/ sol...
> 
> Author: erick
> Date: Sun Nov 27 23:23:00 2011
> New Revision: 1206916
> 
> URL: http://svn.apache.org/viewvc?rev=1206916&view=rev
> Log:
> Rework of SOLR-2438, introducing MultiTermAware for automatically
> configuring how to handle multiterm queries, e.g. lowercasing, accent folding,
> etc.
> 
> Added:
> 
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/
> MultiTermAwareComponent.java
>       - copied unchanged from r1206767,
> lucene/dev/trunk/solr/core/src/java/org/apache/solr/analysis/MultiTermAwar
> eComponent.java
> Modified:
>     lucene/dev/branches/branch_3x/   (props changed)
>     lucene/dev/branches/branch_3x/lucene/   (props changed)
>     lucene/dev/branches/branch_3x/lucene/backwards/src/   (props changed)
>     lucene/dev/branches/branch_3x/lucene/backwards/src/test/   (props
> changed)
>     lucene/dev/branches/branch_3x/lucene/backwards/src/test-framework/
> (props changed)
>     lucene/dev/branches/branch_3x/solr/   (props changed)
>     lucene/dev/branches/branch_3x/solr/CHANGES.txt
> 
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/A
> SCIIFoldingFilterFactory.java
> 
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/L
> owerCaseFilterFactory.java
> 
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/L
> owerCaseTokenizerFactory.java
> 
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/
> MappingCharFilterFactory.java
> 
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/P
> ersianCharFilterFactory.java
> 
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/T
> okenFilterFactory.java
> 
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/Fi
> eldProperties.java
> 
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/Fi
> eldType.java
> 
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/In
> dexSchema.java
> 
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/S
> chemaField.java
> 
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/T
> extField.java
> 
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/search/Sol
> rQueryParser.java
>     lucene/dev/branches/branch_3x/solr/core/src/test-files/solr/conf/schema-
> folding.xml
> 
> lucene/dev/branches/branch_3x/solr/core/src/test/org/apache/solr/schema/M
> ultiTermTest.java
> 
> lucene/dev/branches/branch_3x/solr/core/src/test/org/apache/solr/search/Tes
> tFoldingMultitermQuery.java
>     lucene/dev/branches/branch_3x/solr/example/solr/conf/schema.xml
>     lucene/dev/branches/branch_3x/solr/solrj/   (props changed)
> 
> Modified: lucene/dev/branches/branch_3x/solr/CHANGES.txt
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/CHANGES.t
> xt?rev=1206916&r1=1206915&r2=1206916&view=diff
> ================================================================
> ==============
> --- lucene/dev/branches/branch_3x/solr/CHANGES.txt (original)
> +++ lucene/dev/branches/branch_3x/solr/CHANGES.txt Sun Nov 27 23:23:00
> 2011
> @@ -31,9 +31,11 @@ New Features
>  * SOLR-1565: StreamingUpdateSolrServer supports RequestWriter API and
> therefore, javabin update
>    format (shalin)
> 
> -* SOLR-2438: Case insensitive search for wildcard queries. Actually, the ability
> to specify
> -  a complete analysis chain for multiterm queries.
> -  (Pete Sturge Erick Erickson, Mentoring from Seeley and Muir)
> +* SOLR-2438 added MultiTermAwareComponent to the various classes to
> allow automatic lowercasing
> +  for multiterm queries (wildcards, regex, prefix, range, etc). You can now
> optionally specify a
> +  "multiterm" analyzer in our schema.xml, but Solr should "do the right thing"
> if you don't
> +  specify <fieldType="multiterm"> (Pete Sturge Erick Erickson, Mentoring from
> Seeley and Muir)
> +
> 
>  Bug Fixes
>  ----------------------
> @@ -183,6 +185,11 @@ New Features
>  * SOLR-2438: Case insensitive search for wildcard queries. Actually, the ability
> to specify
>    a complete analysis chain for multiterm queries.
>    (Pete Sturge Erick Erickson, Mentoring from Seeley and Muir)
> +
> +* SOLR-2918 Improvement to SOLR-2438, added MultiTermAwareComponent
> to the various classes
> +  that should transform multiterm queries in various ways, and use this as the
> criteria for
> +  adding them to the multiterm analyzer that is constructed if not specified in
> the
> +  <fieldType>
> 
> 
>  Optimizations
> 
> Modified:
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/A
> SCIIFoldingFilterFactory.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/core/src/ja
> va/org/apache/solr/analysis/ASCIIFoldingFilterFactory.java?rev=1206916&r1=1
> 206915&r2=1206916&view=diff
> ================================================================
> ==============
> ---
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/A
> SCIIFoldingFilterFactory.java (original)
> +++
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/A
> SCIIFoldingFilterFactory.java Sun Nov 27 23:23:00 2011
> @@ -32,9 +32,14 @@ import org.apache.lucene.analysis.TokenS
>   * &lt;/fieldType&gt;</pre>
>   * @version $Id$
>   */
> -public class ASCIIFoldingFilterFactory extends BaseTokenFilterFactory {
> +public class ASCIIFoldingFilterFactory extends BaseTokenFilterFactory
> implements MultiTermAwareComponent {
>    public ASCIIFoldingFilter create(TokenStream input) {
>      return new ASCIIFoldingFilter(input);
>    }
> +
> +  @Override
> +  public Object getMultiTermComponent() {
> +    return this;
> +  }
>  }
> 
> 
> Modified:
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/L
> owerCaseFilterFactory.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/core/src/ja
> va/org/apache/solr/analysis/LowerCaseFilterFactory.java?rev=1206916&r1=12
> 06915&r2=1206916&view=diff
> ================================================================
> ==============
> ---
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/L
> owerCaseFilterFactory.java (original)
> +++
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/L
> owerCaseFilterFactory.java Sun Nov 27 23:23:00 2011
> @@ -33,7 +33,7 @@ import org.apache.lucene.analysis.LowerC
>   * &lt;/fieldType&gt;</pre>
>   * @version $Id$
>   */
> -public class LowerCaseFilterFactory extends BaseTokenFilterFactory {
> +public class LowerCaseFilterFactory extends BaseTokenFilterFactory
> implements MultiTermAwareComponent {
>    @Override
>    public void init(Map<String,String> args) {
>      super.init(args);
> @@ -43,4 +43,9 @@ public class LowerCaseFilterFactory exte
>    public LowerCaseFilter create(TokenStream input) {
>      return new LowerCaseFilter(luceneMatchVersion,input);
>    }
> +
> +  @Override
> +  public Object getMultiTermComponent() {
> +    return this;
> +  }
>  }
> 
> Modified:
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/L
> owerCaseTokenizerFactory.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/core/src/ja
> va/org/apache/solr/analysis/LowerCaseTokenizerFactory.java?rev=1206916&r1
> =1206915&r2=1206916&view=diff
> ================================================================
> ==============
> ---
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/L
> owerCaseTokenizerFactory.java (original)
> +++
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/L
> owerCaseTokenizerFactory.java Sun Nov 27 23:23:00 2011
> @@ -17,8 +17,8 @@
> 
>  package org.apache.solr.analysis;
> 
> +import org.apache.lucene.analysis.LowerCaseFilter;
>  import org.apache.lucene.analysis.LowerCaseTokenizer;
> -
>  import java.io.Reader;
>  import java.util.Map;
> 
> @@ -32,7 +32,7 @@ import java.util.Map;
>   * &lt;/fieldType&gt;</pre>
>   * @version $Id$
>   */
> -public class LowerCaseTokenizerFactory extends BaseTokenizerFactory {
> +public class LowerCaseTokenizerFactory extends BaseTokenizerFactory
> implements MultiTermAwareComponent {
>    @Override
>    public void init(Map<String,String> args) {
>      super.init(args);
> @@ -42,4 +42,10 @@ public class LowerCaseTokenizerFactory e
>    public LowerCaseTokenizer create(Reader input) {
>      return new LowerCaseTokenizer(luceneMatchVersion,input);
>    }
> +
> +  public Object getMultiTermComponent() {
> +    LowerCaseFilterFactory filt = new LowerCaseFilterFactory();
> +    filt.init(args);
> +    return filt;
> +  }
>  }
> 
> Modified:
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/
> MappingCharFilterFactory.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/core/src/ja
> va/org/apache/solr/analysis/MappingCharFilterFactory.java?rev=1206916&r1=
> 1206915&r2=1206916&view=diff
> ================================================================
> ==============
> ---
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/
> MappingCharFilterFactory.java (original)
> +++
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/
> MappingCharFilterFactory.java Sun Nov 27 23:23:00 2011
> @@ -46,7 +46,7 @@ import org.apache.solr.util.plugin.Resou
>   *
>   */
>  public class MappingCharFilterFactory extends BaseCharFilterFactory
> implements
> -    ResourceLoaderAware {
> +    ResourceLoaderAware, MultiTermAwareComponent {
> 
>    protected NormalizeCharMap normMap;
>    private String mapping;
> @@ -126,4 +126,9 @@ public class MappingCharFilterFactory ex
>      }
>      return new String( out, 0, writePos );
>    }
> +
> +  @Override
> +  public Object getMultiTermComponent() {
> +    return this;
> +  }
>  }
> 
> Modified:
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/P
> ersianCharFilterFactory.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/core/src/ja
> va/org/apache/solr/analysis/PersianCharFilterFactory.java?rev=1206916&r1=1
> 206915&r2=1206916&view=diff
> ================================================================
> ==============
> ---
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/P
> ersianCharFilterFactory.java (original)
> +++
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/P
> ersianCharFilterFactory.java Sun Nov 27 23:23:00 2011
> @@ -31,9 +31,14 @@ import org.apache.lucene.analysis.fa.Per
>   * &lt;/fieldType&gt;</pre>
>   * @version $Id$
>   */
> -public class PersianCharFilterFactory extends BaseCharFilterFactory {
> +public class PersianCharFilterFactory extends BaseCharFilterFactory
> implements MultiTermAwareComponent {
> 
>    public CharStream create(CharStream input) {
>      return new PersianCharFilter(input);
>    }
> +
> +  @Override
> +  public Object getMultiTermComponent() {
> +    return this;
> +  }
>  }
> 
> Modified:
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/T
> okenFilterFactory.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/core/src/ja
> va/org/apache/solr/analysis/TokenFilterFactory.java?rev=1206916&r1=120691
> 5&r2=1206916&view=diff
> ================================================================
> ==============
> ---
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/T
> okenFilterFactory.java (original)
> +++
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/analysis/T
> okenFilterFactory.java Sun Nov 27 23:23:00 2011
> @@ -67,3 +67,4 @@ public interface TokenFilterFactory {
>    /** Transform the specified input TokenStream */
>    public TokenStream create(TokenStream input);
>  }
> +
> 
> Modified:
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/Fi
> eldProperties.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/core/src/ja
> va/org/apache/solr/schema/FieldProperties.java?rev=1206916&r1=1206915&r
> 2=1206916&view=diff
> ================================================================
> ==============
> ---
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/Fi
> eldProperties.java (original)
> +++
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/Fi
> eldProperties.java Sun Nov 27 23:23:00 2011
> @@ -48,15 +48,13 @@ public abstract class FieldProperties {
> 
>    protected final static int REQUIRED            = 0x00001000;
>    protected final static int OMIT_POSITIONS      = 0x00002000;
> -  protected final static int LEGACY_MULTITERM    = 0x00004000;
> -
> +
>    static final String[] propertyNames = {
>            "indexed", "tokenized", "stored",
>            "binary", "omitNorms", "omitTermFreqAndPositions",
>            "termVectors", "termPositions", "termOffsets",
>            "multiValued",
> -          "sortMissingFirst","sortMissingLast","required", "omitPositions",
> -          "legacyMultiTerm"
> +          "sortMissingFirst","sortMissingLast","required", "omitPositions"
>    };
> 
>    static final Map<String,Integer> propertyMap = new
> HashMap<String,Integer>();
> 
> Modified:
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/Fi
> eldType.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/core/src/ja
> va/org/apache/solr/schema/FieldType.java?rev=1206916&r1=1206915&r2=120
> 6916&view=diff
> ================================================================
> ==============
> ---
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/Fi
> eldType.java (original)
> +++
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/Fi
> eldType.java Sun Nov 27 23:23:00 2011
> @@ -431,21 +431,6 @@ public abstract class FieldType extends
>    protected Analyzer queryAnalyzer=analyzer;
> 
>    /**
> -    * Analyzer set by schema for text types to use when searching fields
> -    * of this type, subclasses can set analyzer themselves or override
> -    * getAnalyzer()
> -    * This analyzer is used to process wildcard, prefix, regex and other
> multiterm queries. It
> -    * assembles a list of tokenizer +filters that "make sense" for this, primarily
> accent folding and
> -    * lowercasing filters, and charfilters.
> -    *
> -    * If users require old-style behavior, they can specify
> 'legacyMultiterm="true" ' in the schema file
> -    * @see #getMultiTermAnalyzer
> -    * @see #setMultiTermAnalyzer
> -    */
> -   protected Analyzer multiTermAnalyzer=null;
> -
> -
> -  /**
>     * Returns the Analyzer to be used when indexing fields of this type.
>     * <p>
>     * This method may be called many times, at any time.
> @@ -466,23 +451,6 @@ public abstract class FieldType extends
>    public Analyzer getQueryAnalyzer() {
>      return queryAnalyzer;
>    }
> -
> -  /**
> -   * Returns the Analyzer to be used when searching fields of this type when
> mult-term queries are specified.
> -   * <p>
> -   * This method may be called many times, at any time.
> -   * </p>
> -   *
> -   * @see #getAnalyzer
> -   */
> -  public Analyzer getMultiTermAnalyzer() {
> -    return multiTermAnalyzer;
> -  }
> -
> -  private final String analyzerError =
> -    "FieldType: " + this.getClass().getSimpleName() +
> -    " (" + typeName + ") does not support specifying an analyzer";
> -
>    /**
>     * Sets the Analyzer to be used when indexing fields of this type.
>     *
> @@ -507,9 +475,9 @@ public abstract class FieldType extends
> 
>    /**
>     * Sets the Analyzer to be used when querying fields of this type.
> -   * <p/>
> +   *
>     * <p>
> -   * <p/>
> +   * The default implementation throws a SolrException.
>     * Subclasses that override this method need to ensure the behavior
>     * of the analyzer is consistent with the implementation of toInternal.
>     * </p>
> @@ -518,38 +486,15 @@ public abstract class FieldType extends
>     * @see #setAnalyzer
>     * @see #getQueryAnalyzer
>     */
> -  public void setMultiTermAnalyzer(Analyzer analyzer) {
> -    SolrException e = new SolrException
> -        (ErrorCode.SERVER_ERROR,
> -            "FieldType: " + this.getClass().getSimpleName() +
> -                " (" + typeName + ") does not support specifying an analyzer");
> -    SolrException.logOnce(log, null, e);
> -    throw e;
> -  }
> -
> -  /**
> -   * Sets the Analyzer to be used when querying fields of this type.
> -   *
> -   * <p>
> -   * The default implementation throws a SolrException.
> -   * Subclasses that override this method need to ensure the behavior
> -   * of the analyzer is consistent with the implementation of toInternal.
> -   * </p>
> -   *
> -   * @see #toInternal
> -   * @see #setAnalyzer
> -   * @see #getQueryAnalyzer
> -   */
>    public void setQueryAnalyzer(Analyzer analyzer) {
>      SolrException e = new SolrException
>        (ErrorCode.SERVER_ERROR,
> -       "FieldType: " + this.getClass().getSimpleName() +
> +       "FieldType: " + this.getClass().getSimpleName() +
>         " (" + typeName + ") does not support specifying an analyzer");
>      SolrException.logOnce(log,null,e);
>      throw e;
>    }
> 
> -
>    /**
>     * Renders the specified field as XML
>     */
> 
> Modified:
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/In
> dexSchema.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/core/src/ja
> va/org/apache/solr/schema/IndexSchema.java?rev=1206916&r1=1206915&r2=
> 1206916&view=diff
> ================================================================
> ==============
> ---
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/In
> dexSchema.java (original)
> +++
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/In
> dexSchema.java Sun Nov 27 23:23:00 2011
> @@ -455,16 +455,15 @@ public final class IndexSchema {
>            if (queryAnalyzer==null) queryAnalyzer=analyzer;
>            if (analyzer==null) analyzer=queryAnalyzer;
>            if (multiAnalyzer == null) {
> -            Boolean legacyMatch = !
> solrConfig.luceneMatchVersion.onOrAfter(Version.LUCENE_36);;
> -            legacyMatch = (DOMUtil.getAttr(node, "legacyMultiTerm", null) ==
> null) ? legacyMatch :
> -                Boolean.parseBoolean(DOMUtil.getAttr(node, "legacyMultiTerm",
> null));
> -            multiAnalyzer = constructMultiTermAnalyzer(queryAnalyzer,
> legacyMatch);
> +            multiAnalyzer = constructMultiTermAnalyzer(queryAnalyzer);
>            }
> 
>            if (analyzer!=null) {
>              ft.setAnalyzer(analyzer);
>              ft.setQueryAnalyzer(queryAnalyzer);
> -            ft.setMultiTermAnalyzer(multiAnalyzer);
> +            if (ft instanceof TextField) {
> +              ((TextField)ft).setMultiTermAnalyzer(multiAnalyzer);
> +            }
>            }
>            if (ft instanceof SchemaAware){
>              schemaAware.add((SchemaAware) ft);
> @@ -708,43 +707,72 @@ public final class IndexSchema {
>      }
>    }
> 
> -  // The point here is that, if no multitermanalyzer was specified in the schema
> file, do one of several things:
> -  // 1> If legacyMultiTerm == false, assemble a new analyzer composed of all
> of the charfilters,
> -  //    lowercase filters and asciifoldingfilter.
> -  // 2> If letacyMultiTerm == true just construct the analyzer from a
> KeywordTokenizer. That should mimic current behavior.
> -  //    Do the same if they've specified that the old behavior is required
> (legacyMultiTerm="true")
> -
> -  private Analyzer constructMultiTermAnalyzer(Analyzer queryAnalyzer,
> Boolean legacyMultiTerm) {
> +  private Analyzer constructMultiTermAnalyzer(Analyzer queryAnalyzer) {
>      if (queryAnalyzer == null) return null;
> 
> -    if (legacyMultiTerm || (!(queryAnalyzer instanceof TokenizerChain))) {
> -      return new KeywordAnalyzer();
> -    }
> +    if (!(queryAnalyzer instanceof TokenizerChain)) {
> +       return new KeywordAnalyzer();
> +     }
> 
>      TokenizerChain tc = (TokenizerChain) queryAnalyzer;
> +    MultiTermChainBuilder builder = new MultiTermChainBuilder();
> +
> +    CharFilterFactory[] charFactories = tc.getCharFilterFactories();
> +    if (charFactories != null) {
> +      for (CharFilterFactory fact : charFactories) {
> +        builder.add(fact);
> +      }
> +    }
> 
> -    // we know it'll never be longer than this unless the code below is explicitly
> changed
> -    TokenFilterFactory[] filters = new TokenFilterFactory[2];
> -    int idx = 0;
> -    for (TokenFilterFactory factory : tc.getTokenFilterFactories()) {
> -      if (factory instanceof LowerCaseFilterFactory) {
> -        filters[idx] = new LowerCaseFilterFactory();
> -        filters[idx++].init(factory.getArgs());
> -      }
> -      if (factory instanceof ASCIIFoldingFilterFactory) {
> -        filters[idx] = new ASCIIFoldingFilterFactory();
> -        filters[idx++].init(factory.getArgs());
> -      }
> -    }
> -    WhitespaceTokenizerFactory white = new WhitespaceTokenizerFactory();
> -    white.init(tc.getTokenizerFactory().getArgs());
> -
> -    TokenFilterFactory[] filterSplice = new TokenFilterFactory[idx];
> -    System.arraycopy(filters, 0, filterSplice, 0, idx);
> -    return new TokenizerChain(tc.getCharFilterFactories(),
> -        white, filterSplice);
> +    builder.add(tc.getTokenizerFactory());
> +
> +    for (TokenFilterFactory fact : tc.getTokenFilterFactories()) {
> +      builder.add(fact);
> +    }
> +
> +    return builder.build();
>    }
> 
> +  private static class MultiTermChainBuilder {
> +    static final KeywordTokenizerFactory keyFactory;
> +
> +    static {
> +      keyFactory = new KeywordTokenizerFactory();
> +      keyFactory.init(new HashMap<String,String>());
> +    }
> +
> +    ArrayList<CharFilterFactory> charFilters = null;
> +    ArrayList<TokenFilterFactory> filters = new
> ArrayList<TokenFilterFactory>(2);
> +    TokenizerFactory tokenizer = keyFactory;
> +
> +    public void add(Object current) {
> +      if (!(current instanceof MultiTermAwareComponent)) return;
> +      Object newComponent =
> ((MultiTermAwareComponent)current).getMultiTermComponent();
> +      if (newComponent instanceof TokenFilterFactory) {
> +        if (filters == null) {
> +          filters = new ArrayList<TokenFilterFactory>(2);
> +        }
> +        filters.add((TokenFilterFactory)newComponent);
> +      } else if (newComponent instanceof TokenizerFactory) {
> +        tokenizer = (TokenizerFactory)newComponent;
> +      } else if (newComponent instanceof CharFilterFactory) {
> +        if (charFilters == null) {
> +          charFilters = new ArrayList<CharFilterFactory>(1);
> +        }
> +        charFilters.add( (CharFilterFactory)newComponent);
> +
> +      } else {
> +        throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
> "Unknown analysis component from MultiTermAwareComponent: " +
> newComponent);
> +      }
> +    }
> +
> +    public TokenizerChain build() {
> +      CharFilterFactory[] charFilterArr =  charFilters == null ? null :
> charFilters.toArray(new CharFilterFactory[charFilters.size()]);
> +      TokenFilterFactory[] filterArr = filters == null ? new TokenFilterFactory[0] :
> filters.toArray(new TokenFilterFactory[filters.size()]);
> +      return new TokenizerChain(charFilterArr, tokenizer, filterArr);
> +    }
> +
> +  }
>    /**
>     * Register one or more new Dynamic Field with the Schema.
>     * @param f The {@link org.apache.solr.schema.SchemaField}
> 
> Modified:
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/S
> chemaField.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/core/src/ja
> va/org/apache/solr/schema/SchemaField.java?rev=1206916&r1=1206915&r2=
> 1206916&view=diff
> ================================================================
> ==============
> ---
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/S
> chemaField.java (original)
> +++
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/S
> chemaField.java Sun Nov 27 23:23:00 2011
> @@ -99,10 +99,6 @@ public final class SchemaField extends F
>    boolean isTokenized() { return (properties & TOKENIZED)!=0; }
>    boolean isBinary() { return (properties & BINARY)!=0; }
> 
> -  boolean legacyMultiTerm() {
> -    return (properties & LEGACY_MULTITERM) != 0;
> -  }
> -
>    public Fieldable createField(String val, float boost) {
>      return type.createField(this,val,boost);
>    }
> 
> Modified:
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/T
> extField.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/core/src/ja
> va/org/apache/solr/schema/TextField.java?rev=1206916&r1=1206915&r2=120
> 6916&view=diff
> ================================================================
> ==============
> ---
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/T
> extField.java (original)
> +++
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/schema/T
> extField.java Sun Nov 27 23:23:00 2011
> @@ -17,13 +17,7 @@
> 
>  package org.apache.solr.schema;
> 
> -import org.apache.lucene.search.SortField;
> -import org.apache.lucene.search.Query;
> -import org.apache.lucene.search.PhraseQuery;
> -import org.apache.lucene.search.TermQuery;
> -import org.apache.lucene.search.BooleanQuery;
> -import org.apache.lucene.search.BooleanClause;
> -import org.apache.lucene.search.MultiPhraseQuery;
> +import org.apache.lucene.search.*;
>  import org.apache.lucene.document.Fieldable;
>  import org.apache.lucene.index.Term;
>  import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
> @@ -31,6 +25,7 @@ import org.apache.lucene.analysis.tokena
>  import org.apache.lucene.analysis.CachingTokenFilter;
>  import org.apache.lucene.analysis.TokenStream;
>  import org.apache.lucene.analysis.Analyzer;
> +import org.apache.solr.common.SolrException;
>  import org.apache.solr.response.TextResponseWriter;
>  import org.apache.solr.response.XMLWriter;
>  import org.apache.solr.search.QParser;
> @@ -48,6 +43,19 @@ import java.io.StringReader;
>  public class TextField extends FieldType {
>    protected boolean autoGeneratePhraseQueries;
> 
> +  /**
> +   * Analyzer set by schema for text types to use when searching fields
> +   * of this type, subclasses can set analyzer themselves or override
> +   * getAnalyzer()
> +   * This analyzer is used to process wildcard, prefix, regex and other
> multiterm queries. It
> +   * assembles a list of tokenizer +filters that "make sense" for this, primarily
> accent folding and
> +   * lowercasing filters, and charfilters.
> +   *
> +   * @see #getMultiTermAnalyzer
> +   * @see #setMultiTermAnalyzer
> +   */
> +  protected Analyzer multiTermAnalyzer=null;
> +
>    @Override
>    protected void init(IndexSchema schema, Map<String,String> args) {
>      properties |= TOKENIZED;
> @@ -63,6 +71,21 @@ public class TextField extends FieldType
>      super.init(schema, args);
>    }
> 
> +  /**
> +   * Returns the Analyzer to be used when searching fields of this type when
> mult-term queries are specified.
> +   * <p>
> +   * This method may be called many times, at any time.
> +   * </p>
> +   * @see #getAnalyzer
> +   */
> +  public Analyzer getMultiTermAnalyzer() {
> +    return multiTermAnalyzer;
> +  }
> +
> +  public void setMultiTermAnalyzer(Analyzer analyzer) {
> +    this.multiTermAnalyzer = analyzer;
> +  }
> +
>    public boolean getAutoGeneratePhraseQueries() {
>      return autoGeneratePhraseQueries;
>    }
> @@ -98,11 +121,51 @@ public class TextField extends FieldType
>      this.queryAnalyzer = analyzer;
>    }
> 
> +
>    @Override
> -  public void setMultiTermAnalyzer(Analyzer analyzer) {
> -    this.multiTermAnalyzer = analyzer;
> +  public Query getRangeQuery(QParser parser, SchemaField field, String
> part1, String part2, boolean minInclusive, boolean maxInclusive) {
> +    Analyzer multiAnalyzer = getMultiTermAnalyzer();
> +    String lower = analyzeMultiTerm(field.getName(), part1, multiAnalyzer);
> +    String upper = analyzeMultiTerm(field.getName(), part2, multiAnalyzer);
> +    return new TermRangeQuery(field.getName(), lower, upper, minInclusive,
> maxInclusive);
> +  }
> +
> +  public String analyzeMultiTerm(String field, String part, Analyzer analyzerIn)
> {
> +    if (part == null) return null;
> +
> +    TokenStream source;
> +    if (analyzerIn == null) analyzerIn = multiTermAnalyzer;
> +    try {
> +      source = analyzerIn.tokenStream(field, new StringReader(part));
> +      source.reset();
> +    } catch (IOException e) {
> +      throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Unable
> to initialize TokenStream to analyze multiTerm term: " + part, e);
> +    }
> +    CharTermAttribute termAtt = source.getAttribute(CharTermAttribute.class);
> +    String termRet = "";
> +
> +    try {
> +      if (!source.incrementToken())
> +        throw new IllegalArgumentException("analyzer returned no terms for
> multiTerm term: " + part);
> +      termRet = termAtt.toString();
> +      if (source.incrementToken())
> +        throw new IllegalArgumentException("analyzer returned too many terms
> for multiTerm term: " + part);
> +    } catch (IOException e) {
> +      throw new RuntimeException("error analyzing range part: " + part, e);
> +    }
> +
> +    try {
> +      source.end();
> +      source.close();
> +    } catch (IOException e) {
> +      throw new RuntimeException("Unable to end & close TokenStream after
> analyzing multiTerm term: " + part, e);
> +    }
> +
> +    return termRet;
> +
>    }
> 
> +
>    static Query parseFieldQuery(QParser parser, Analyzer analyzer, String field,
> String queryText) {
>      int phraseSlop = 0;
>      boolean enablePositionIncrements = true;
> 
> Modified:
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/search/Sol
> rQueryParser.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/core/src/ja
> va/org/apache/solr/search/SolrQueryParser.java?rev=1206916&r1=1206915&r
> 2=1206916&view=diff
> ================================================================
> ==============
> ---
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/search/Sol
> rQueryParser.java (original)
> +++
> lucene/dev/branches/branch_3x/solr/core/src/java/org/apache/solr/search/Sol
> rQueryParser.java Sun Nov 27 23:23:00 2011
> @@ -58,15 +58,16 @@ public class SolrQueryParser extends Que
>    protected final IndexSchema schema;
>    protected final QParser parser;
>    protected final String defaultField;
> -  protected final Map<String, ReversedWildcardFilterFactory>
> leadingWildcards =
> -    new HashMap<String, ReversedWildcardFilterFactory>();
> 
> -  /**
> +  // implementation detail - caching ReversedWildcardFilterFactory based on
> type
> +  private Map<FieldType, ReversedWildcardFilterFactory> leadingWildcards;
> +
> +   /**
>     * Constructs a SolrQueryParser using the schema to understand the
>     * formats and datatypes of each field.  Only the defaultSearchField
>     * will be used from the IndexSchema (unless overridden),
>     * &lt;solrQueryParser&gt; will not be used.
> -   *
> +   *
>     * @param schema Used for default search field name if defaultField is null
> and field information is used for analysis
>     * @param defaultField default field used for unspecified search terms.  if
> null, the schema default field is used
>     * @see IndexSchema#getDefaultSearchFieldName()
> @@ -78,7 +79,8 @@ public class SolrQueryParser extends Que
>      this.defaultField = defaultField;
>      setLowercaseExpandedTerms(false);
>      setEnablePositionIncrements(true);
> -    checkAllowLeadingWildcards();
> +    setLowercaseExpandedTerms(false);
> +    setAllowLeadingWildcard(true);
>    }
> 
>    public SolrQueryParser(QParser parser, String defaultField) {
> @@ -92,30 +94,34 @@ public class SolrQueryParser extends Que
>      this.defaultField = defaultField;
>      setLowercaseExpandedTerms(false);
>      setEnablePositionIncrements(true);
> -    checkAllowLeadingWildcards();
> +    setLowercaseExpandedTerms(false);
> +    setAllowLeadingWildcard(true);
>    }
> 
> -  protected void checkAllowLeadingWildcards() {
> -    boolean allow = false;
> -    for (Entry<String, FieldType> e : schema.getFieldTypes().entrySet()) {
> -      Analyzer a = e.getValue().getAnalyzer();
> -      if (a instanceof TokenizerChain) {
> -        // examine the indexing analysis chain if it supports leading wildcards
> -        TokenizerChain tc = (TokenizerChain)a;
> -        TokenFilterFactory[] factories = tc.getTokenFilterFactories();
> -        for (TokenFilterFactory factory : factories) {
> -          if (factory instanceof ReversedWildcardFilterFactory) {
> -            allow = true;
> -            leadingWildcards.put(e.getKey(),
> (ReversedWildcardFilterFactory)factory);
> -          }
> +  protected ReversedWildcardFilterFactory
> getReversedWildcardFilterFactory(FieldType fieldType) {
> +    if (leadingWildcards == null) leadingWildcards = new HashMap<FieldType,
> ReversedWildcardFilterFactory>();
> +    ReversedWildcardFilterFactory fac = leadingWildcards.get(fieldType);
> +    if (fac == null && leadingWildcards.containsKey(fac)) {
> +      return fac;
> +    }
> +
> +    Analyzer a = fieldType.getAnalyzer();
> +    if (a instanceof TokenizerChain) {
> +      // examine the indexing analysis chain if it supports leading wildcards
> +      TokenizerChain tc = (TokenizerChain)a;
> +      TokenFilterFactory[] factories = tc.getTokenFilterFactories();
> +      for (TokenFilterFactory factory : factories) {
> +        if (factory instanceof ReversedWildcardFilterFactory) {
> +          fac = (ReversedWildcardFilterFactory)factory;
> +          break;
>          }
>        }
>      }
> -    // XXX should be enabled on a per-field basis
> -    if (allow) {
> -      setAllowLeadingWildcard(true);
> -    }
> +
> +    leadingWildcards.put(fieldType, fac);
> +    return fac;
>    }
> +
> 
>    private void checkNullField(String field) throws SolrException {
>      if (field == null && defaultField == null) {
> @@ -125,12 +131,12 @@ public class SolrQueryParser extends Que
>      }
>    }
> 
> -  protected String analyzeIfMultitermTermText(String field, String part,
> Analyzer analyzer) {
> +  protected String analyzeIfMultitermTermText(String field, String part,
> FieldType fieldType) {
>      if (part == null) return part;
> 
>      SchemaField sf = schema.getFieldOrNull((field));
> -    if (sf == null || !(sf.getType() instanceof TextField)) return part;
> -    return analyzeMultitermTerm(field, part, analyzer);
> +    if (sf == null || ! (fieldType instanceof TextField)) return part;
> +    return ((TextField)fieldType).analyzeMultiTerm(field, part,
> ((TextField)fieldType).getMultiTermAnalyzer());
>    }
> 
>    @Override
> @@ -168,8 +174,6 @@ public class SolrQueryParser extends Que
>    @Override
>    protected Query getRangeQuery(String field, String part1, String part2,
> boolean inclusive) throws ParseException {
>      checkNullField(field);
> -    part1 = analyzeIfMultitermTermText(field, part1,
> schema.getFieldType(field).getMultiTermAnalyzer());
> -    part2 = analyzeIfMultitermTermText(field, part2,
> schema.getFieldType(field).getMultiTermAnalyzer());
> 
>      SchemaField sf = schema.getField(field);
>      return sf.getType().getRangeQuery(parser, sf,
> @@ -185,21 +189,10 @@ public class SolrQueryParser extends Que
>        termStr = termStr.toLowerCase();
>      }
> 
> -    termStr = analyzeIfMultitermTermText(field, termStr,
> schema.getFieldType(field).getMultiTermAnalyzer());
> +    termStr = analyzeIfMultitermTermText(field, termStr,
> schema.getFieldType(field));
> 
> -    // TODO: toInternal() won't necessarily work on partial
> -    // values, so it looks like we need a getPrefix() function
> -    // on fieldtype?  Or at the minimum, a method on fieldType
> -    // that can tell me if I should lowercase or not...
> -    // Schema could tell if lowercase filter is in the chain,
> -    // but a more sure way would be to run something through
> -    // the first time and check if it got lowercased.
> -
> -    // TODO: throw exception if field type doesn't support prefixes?
> -    // (sortable numeric types don't do prefixes, but can do range queries)
> -    Term t = new Term(field, termStr);
> -    PrefixQuery prefixQuery = new PrefixQuery(t);
> -    return prefixQuery;
> +    // Solr has always used constant scoring for prefix queries.  This should
> return constant scoring by default.
> +    return newPrefixQuery(new Term(field, termStr));
>    }
>    @Override
>    protected Query getWildcardQuery(String field, String termStr) throws
> ParseException {
> @@ -207,11 +200,11 @@ public class SolrQueryParser extends Que
>      if ("*".equals(field) && "*".equals(termStr)) {
>        return newMatchAllDocsQuery();
>      }
> -    termStr = analyzeIfMultitermTermText(field, termStr,
> schema.getFieldType(field).getMultiTermAnalyzer());
> +    FieldType fieldType = schema.getFieldType(field);
> +    termStr = analyzeIfMultitermTermText(field, termStr, fieldType);
> 
>      // can we use reversed wildcards in this field?
> -    String type = schema.getFieldType(field).getTypeName();
> -    ReversedWildcardFilterFactory factory = leadingWildcards.get(type);
> +    ReversedWildcardFilterFactory factory =
> getReversedWildcardFilterFactory(fieldType);
>      if (factory != null && factory.shouldReverse(termStr)) {
>        int len = termStr.length();
>        char[] chars = new char[len+1];
> @@ -220,13 +213,8 @@ public class SolrQueryParser extends Que
>        ReversedWildcardFilter.reverse(chars, 1, len);
>        termStr = new String(chars);
>      }
> -    Query q = super.getWildcardQuery(field, termStr);
> -    if (q instanceof WildcardQuery) {
> -      // use a constant score query to avoid overflowing clauses
> -      WildcardQuery wildcardQuery = new
> WildcardQuery(((WildcardQuery)q).getTerm());
> -      return  wildcardQuery;
> -    }
> -    return q;
> -  }
> 
> +    // Solr has always used constant scoring for wildcard queries.  This should
> return constant scoring by default.
> +    return newWildcardQuery(new Term(field, termStr));
> +  }
>  }
> 
> Modified: lucene/dev/branches/branch_3x/solr/core/src/test-
> files/solr/conf/schema-folding.xml
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/core/src/te
> st-files/solr/conf/schema-
> folding.xml?rev=1206916&r1=1206915&r2=1206916&view=diff
> ================================================================
> ==============
> --- lucene/dev/branches/branch_3x/solr/core/src/test-files/solr/conf/schema-
> folding.xml (original)
> +++ lucene/dev/branches/branch_3x/solr/core/src/test-files/solr/conf/schema-
> folding.xml Sun Nov 27 23:23:00 2011
> @@ -64,7 +64,7 @@
>        </analyzer>
>      </fieldType>
> 
> -    <fieldType name="text_rev" class="solr.TextField"
> legacyMultiTerm="false">
> +    <fieldType name="text_rev" class="solr.TextField">
>        <analyzer type="index">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
> @@ -80,12 +80,25 @@
>        </analyzer>
>      </fieldType>
> 
> -    <fieldType name="text_lower_tokenizer" class="solr.TextField">
> +    <fieldType name="text_lower_token" class="solr.TextField">
>        <analyzer>
>          <tokenizer class="solr.LowerCaseTokenizerFactory"/>
> +        <filter class="solr.ASCIIFoldingFilterFactory"/>
> +      </analyzer>
> +    </fieldType>
> +
> +    <fieldType name="text_oldstyle" class="solr.TextField">
> +      <analyzer>
> +        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> +        <filter class="solr.LowerCaseFilterFactory"/>
> +        <filter class="solr.ASCIIFoldingFilterFactory"/>
> +      </analyzer>
> +      <analyzer type="multiterm">
> +        <tokenizer class="solr.KeywordTokenizerFactory" />
>        </analyzer>
>      </fieldType>
> 
> +
>      <fieldType name="text_charfilter" class="solr.TextField"
> multiValued="false">
>        <analyzer type="index">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> @@ -99,19 +112,47 @@
>        </analyzer>
>      </fieldType>
> 
> -    <fieldType name="text_oldstyle" class="solr.TextField" multiValued="false"
> legacyMultiTerm="true">
> +    <fieldType name="text_straight" class="solr.TextField">
> +      <analyzer>
> +        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> +      </analyzer>
> +    </fieldType>
> +
> +    <fieldType name="text_lower" class="solr.TextField">
> +      <analyzer>
> +        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> +        <filter class="solr.LowerCaseFilterFactory"/>
> +      </analyzer>
> +    </fieldType>
> +
> +    <fieldType name="text_folding" class="solr.TextField">
> +      <analyzer>
> +        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> +        <filter class="solr.ASCIIFoldingFilterFactory"/>
> +      </analyzer>
> +    </fieldType>
> +
> +    <fieldType name="text_stemming" class="solr.TextField">
>        <analyzer>
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.ASCIIFoldingFilterFactory"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
> -        <filter class="solr.TrimFilterFactory"/>
> +        <filter class="solr.PorterStemFilterFactory"/>
> +      </analyzer>
> +    </fieldType>
> +
> +    <fieldType name="text_keyword" class="solr.TextField"
> sortMissingLast="true" omitNorms="true">
> +      <analyzer>
> +         <tokenizer class="solr.KeywordTokenizerFactory"/>
> +        <filter class="solr.LowerCaseFilterFactory" />
>        </analyzer>
>      </fieldType>
> 
> -    <fieldType name="int" class="solr.TrieIntField" precisionStep="0"
> omitNorms="true" positionIncrementGap="0"/>
> -    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0"
> omitNorms="true" positionIncrementGap="0"/>
> -    <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
> omitNorms="true" positionIncrementGap="0"/>
> -    <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0"
> omitNorms="true" positionIncrementGap="0"/>
> +
> +    <fieldType name="int" class="solr.TrieIntField" precisionStep="4"
> omitNorms="true" positionIncrementGap="0"/>
> +    <fieldType name="float" class="solr.TrieFloatField" precisionStep="4"
> omitNorms="true" positionIncrementGap="0"/>
> +    <fieldType name="long" class="solr.TrieLongField" precisionStep="4"
> omitNorms="true" positionIncrementGap="0"/>
> +    <fieldType name="double" class="solr.TrieDoubleField" precisionStep="4"
> omitNorms="true" positionIncrementGap="0"/>
>      <fieldType name="byte" class="solr.ByteField" omitNorms="true"
> positionIncrementGap="0"/>
>      <fieldType name="short" class="solr.ShortField" omitNorms="true"
> positionIncrementGap="0"/>
>      <fieldtype name="boolean" class="solr.BoolField" sortMissingLast="true"/>
> @@ -133,10 +174,17 @@
>      <field name="content_ws" type="text_ws" indexed="true" stored="true"/>
>      <field name="content_rev" type="text_rev" indexed="true" stored="true"/>
>      <field name="content_multi" type="text_multi" indexed="true"
> stored="true"/>
> -    <field name="content_lower_token" type="text_multi" indexed="true"
> stored="true"/>
> +    <field name="content_lower_token" type="text_lower_token"
> indexed="true" stored="true"/>
>      <field name="content_oldstyle" type="text_oldstyle" indexed="true"
> stored="true"/>
>      <field name="content_charfilter" type="text_charfilter" indexed="true"
> stored="true"/>
>      <field name="content_multi_bad" type="text_multi_bad" indexed="true"
> stored="true"/>
> +
> +    <dynamicField name="*_straight" type="text_straight" indexed="true"
> stored="true"/>
> +    <dynamicField name="*_lower" type="text_lower" indexed="true"
> stored="true"/>
> +    <dynamicField name="*_folding" type="text_folding" indexed="true"
> stored="true"/>
> +    <dynamicField name="*_stemming" type="text_stemming" indexed="true"
> stored="true"/>
> +    <dynamicField name="*_keyword" type="text_keyword" indexed="true"
> stored="true"/>
> +
>    </fields>
> 
>    <defaultSearchField>content</defaultSearchField>
> 
> Modified:
> lucene/dev/branches/branch_3x/solr/core/src/test/org/apache/solr/schema/M
> ultiTermTest.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/core/src/te
> st/org/apache/solr/schema/MultiTermTest.java?rev=1206916&r1=1206915&r2
> =1206916&view=diff
> ================================================================
> ==============
> ---
> lucene/dev/branches/branch_3x/solr/core/src/test/org/apache/solr/schema/M
> ultiTermTest.java (original)
> +++
> lucene/dev/branches/branch_3x/solr/core/src/test/org/apache/solr/schema/M
> ultiTermTest.java Sun Nov 27 23:23:00 2011
> @@ -36,7 +36,7 @@ public class MultiTermTest extends SolrT
>    @Test
>    public void testMultiFound() {
>      SchemaField field = h.getCore().getSchema().getField("content_multi");
> -    Analyzer analyzer = field.getType().getMultiTermAnalyzer();
> +    Analyzer analyzer = ((TextField)field.getType()).getMultiTermAnalyzer();
>      assertTrue(analyzer instanceof TokenizerChain);
>      assertTrue(((TokenizerChain) analyzer).getTokenizerFactory() instanceof
> WhitespaceTokenizerFactory);
>      TokenizerChain tc = (TokenizerChain) analyzer;
> @@ -58,9 +58,9 @@ public class MultiTermTest extends SolrT
>    @Test
>    public void testQueryCopiedToMulti() {
>      SchemaField field = h.getCore().getSchema().getField("content_charfilter");
> -    Analyzer analyzer = field.getType().getMultiTermAnalyzer();
> +    Analyzer analyzer = ((TextField)field.getType()).getMultiTermAnalyzer();
>      assertTrue(analyzer instanceof TokenizerChain);
> -    assertTrue(((TokenizerChain) analyzer).getTokenizerFactory() instanceof
> WhitespaceTokenizerFactory);
> +    assertTrue(((TokenizerChain) analyzer).getTokenizerFactory() instanceof
> KeywordTokenizerFactory);
>      TokenizerChain tc = (TokenizerChain) analyzer;
>      for (TokenFilterFactory factory : tc.getTokenFilterFactories()) {
>        assertTrue(factory instanceof LowerCaseFilterFactory);
> @@ -73,15 +73,15 @@ public class MultiTermTest extends SolrT
>    @Test
>    public void testDefaultCopiedToMulti() {
>      SchemaField field = h.getCore().getSchema().getField("content_ws");
> -    Analyzer analyzer = field.getType().getMultiTermAnalyzer();
> +    Analyzer analyzer = ((TextField)field.getType()).getMultiTermAnalyzer();
>      assertTrue(analyzer instanceof TokenizerChain);
> -    assertTrue(((TokenizerChain) analyzer).getTokenizerFactory() instanceof
> WhitespaceTokenizerFactory);
> +    assertTrue(((TokenizerChain) analyzer).getTokenizerFactory() instanceof
> KeywordTokenizerFactory);
>      TokenizerChain tc = (TokenizerChain) analyzer;
>      for (TokenFilterFactory factory : tc.getTokenFilterFactories()) {
>        assertTrue((factory instanceof ASCIIFoldingFilterFactory) || (factory
> instanceof LowerCaseFilterFactory));
>      }
> 
> -    assertTrue(tc.getCharFilterFactories().length == 0);
> +    assertTrue(tc.getCharFilterFactories() == null);
> 
>    }
>  }
> 
> Modified:
> lucene/dev/branches/branch_3x/solr/core/src/test/org/apache/solr/search/Tes
> tFoldingMultitermQuery.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/core/src/te
> st/org/apache/solr/search/TestFoldingMultitermQuery.java?rev=1206916&r1=
> 1206915&r2=1206916&view=diff
> ================================================================
> ==============
> ---
> lucene/dev/branches/branch_3x/solr/core/src/test/org/apache/solr/search/Tes
> tFoldingMultitermQuery.java (original)
> +++
> lucene/dev/branches/branch_3x/solr/core/src/test/org/apache/solr/search/Tes
> tFoldingMultitermQuery.java Sun Nov 27 23:23:00 2011
> @@ -59,7 +59,12 @@ public class TestFoldingMultitermQuery e
>            "content_lower_token", docs[i],
>            "content_oldstyle", docs[i],
>            "content_charfilter", docs[i],
> -          "content_multi_bad", docs[i]
> +          "content_multi_bad", docs[i],
> +          "content_straight", docs[i],
> +          "content_lower", docs[i],
> +          "content_folding", docs[i],
> +          "content_stemming", docs[i],
> +          "content_keyword", docs[i]
>        ));
>      }
>      assertU(optimize());
> @@ -95,6 +100,8 @@ public class TestFoldingMultitermQuery e
>          assertQ(req("q", "content_lower_token:" + me),
>              "//result[@numFound='1']",
>              "//*[@name='id'][.='" + Integer.toString(idx) + "']");
> +        assertQ(req("q", "content_oldstyle:" + me),
> +            "//result[@numFound='0']");
>        }
>      }
>      for (int idx = 0; idx < matchRevPrefixUpper.length; idx++) {
> @@ -128,13 +135,31 @@ public class TestFoldingMultitermQuery e
>          assertQ(req("q", "content_multi:" + me),
>              "//result[@numFound='1']",
>              "//*[@name='id'][.='" + Integer.toString(idx) + "']");
> -        assertQ(req("q", "content_lower_token:" + me),
> -            "//result[@numFound='1']",
> -            "//*[@name='id'][.='" + Integer.toString(idx) + "']");
> +        assertQ(req("q", "content_oldstyle:" + me),
> +            "//result[@numFound='0']");
>        }
>      }
>    }
> 
> +  @Test
> +  public void testLowerTokenizer() {
> +    // The lowercasetokenizer will remove the '1' from the index, but not from
> the query, thus the special test.
> +    assertQ(req("q", "content_lower_token:Á*C*"),
> "//result[@numFound='1']");
> +    assertQ(req("q", "content_lower_token:Á*C*1"),
> "//result[@numFound='0']");
> +    assertQ(req("q", "content_lower_token:h*1"), "//result[@numFound='0']");
> +    assertQ(req("q", "content_lower_token:H*1"), "//result[@numFound='0']");
> +    assertQ(req("q", "content_lower_token:*1"), "//result[@numFound='0']");
> +    assertQ(req("q", "content_lower_token:HÏ*l?*"),
> "//result[@numFound='1']");
> +    assertQ(req("q", "content_lower_token:hȉ*l?*"),
> "//result[@numFound='1']");
> +  }
> +
> +
> +  @Test
> +  public void testGeneral() throws Exception {
> +    assertQ(req("q", "content_stemming:fings*"), "//result[@numFound='0']");
> // should not match (but would if fings* was stemmed to fing*
> +    assertQ(req("q", "content_stemming:fing*"), "//result[@numFound='1']");
> +  }
> +
>    // Phrases should fail. This test is mainly a marker so if phrases ever do start
> working with wildcards we go
>    // and update the documentation
>    @Test
> @@ -143,17 +168,14 @@ public class TestFoldingMultitermQuery e
>          "//result[@numFound='0']");
>    }
> 
> -  // Make sure the legacy behavior flag is honored
> -  @Test
> -  public void testLegacyBehavior() {
> -    assertQ(req("q", "content_oldstyle:ABCD*"),
> -        "//result[@numFound='0']");
> -  }
> -
>    @Test
>    public void testWildcardRange() {
>      assertQ(req("q", "content:[* TO *]"),
>          "//result[@numFound='3']");
> +    assertQ(req("q", "content:[AB* TO Z*]"),
> +        "//result[@numFound='3']");
> +    assertQ(req("q", "content:[AB*E?G* TO TU*W]"),
> +        "//result[@numFound='3']");
>    }
> 
> 
> @@ -222,10 +244,13 @@ public class TestFoldingMultitermQuery e
>    @Test
>    public void testMultiBad() {
>      try {
> +      ignoreException("analyzer returned too many terms");
>        assertQ(req("q", "content_multi_bad:" + "abCD*"));
>        fail("Should throw exception when token evaluates to more than one
> term");
>      } catch (Exception expected) {
> -      assertTrue(expected.getCause() instanceof IllegalArgumentException);
> +      assertTrue(expected.getCause() instanceof
> java.lang.IllegalArgumentException);
> +    } finally {
> +      resetExceptionIgnores();
>      }
>    }
>  }
> \ No newline at end of file
> 
> Modified: lucene/dev/branches/branch_3x/solr/example/solr/conf/schema.xml
> URL:
> http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/example/so
> lr/conf/schema.xml?rev=1206916&r1=1206915&r2=1206916&view=diff
> ================================================================
> ==============
> --- lucene/dev/branches/branch_3x/solr/example/solr/conf/schema.xml
> (original)
> +++ lucene/dev/branches/branch_3x/solr/example/solr/conf/schema.xml Sun
> Nov 27 23:23:00 2011
> @@ -444,41 +444,6 @@
>        </analyzer>
>      </fieldType>
> 
> -    <!-- Illustrates the new "multiterm" analyzer definition the <fieldType> can
> take a new
> -         parameter legacyMultiTerm="true" if the old behvaior is desired. The
> new default
> -         behavior as of 3.6+ is to automatically define a multiterm analyzer
> -    -->
> -    <fieldType name="text_multiterm" class="solr.TextField"
> positionIncrementGap="100">
> -      <analyzer type="index">
> -        <tokenizer class="solr.StandardTokenizerFactory"/>
> -        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
> -        <filter class="solr.LowerCaseFilterFactory"/>
> -      </analyzer>
> -      <analyzer type="query">
> -        <tokenizer class="solr.StandardTokenizerFactory"/>
> -        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"/>
> -        <filter class="solr.LowerCaseFilterFactory"/>
> -      </analyzer>
> -      <!-- Illustrates the use of a new analyzer type "multiterm". See the Wiki
> page "Multiterm
> -           Query Analysis" and SOLR-2438 for full details. The short form is that
> this analyzer is
> -           applied to wildcard terms (prefix, wildcard range) if specified. This
> allows, among other
> -           things, not having to lowercase wildcard terms on the client.
> -
> -           In the absence of this section, the new default behavior (3.6, 4.0) is to
> construct
> -           one of these from the query analyzer that incorporates any defined
> charfilters, a
> -           WhitespaceTokenizer, a LowerCaseFilter (if defined), and an
> ASCIIFoldingFilter
> -           (if defined).
> -
> -           Arguably, this is an expert-level analyzer, most cases will be handled by
> an instance
> -           of this being automatically constructed from the queryanalyzer.
> -
> -      -->
> -      <analyzer type="multiterm">
> -        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> -        <filter class="solr.LowerCaseFilterFactory"/>
> -        <filter class="solr.ASCIIFoldingFilterFactory"/>
> -      </analyzer>
> -    </fieldType>
> 
>      <!-- since fields of this type are by default not stored or indexed,
>           any data added to them will be ignored outright.  -->
> @@ -601,6 +566,7 @@
>     <dynamicField name="*_l"  type="long"   indexed="true"  stored="true"/>
>     <dynamicField name="*_t"  type="text_general"    indexed="true"
> stored="true"/>
>     <dynamicField name="*_txt" type="text_general"    indexed="true"
> stored="true" multiValued="true"/>
> +   <dynamicField name="*_en"  type="text_en"    indexed="true"
> stored="true" multiValued="true" />
>     <dynamicField name="*_b"  type="boolean" indexed="true"  stored="true"/>
>     <dynamicField name="*_f"  type="float"  indexed="true"  stored="true"/>
>     <dynamicField name="*_d"  type="double" indexed="true"  stored="true"/>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org