You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by John Marks <a8...@gmail.com> on 2009/03/05 16:17:37 UTC

indexing but not tokenizing

Hi all,

I'm not able to see what's wrong in the following sample code.
I'm indexing a document with 5 fields, using five different indexing strategies.
I'm fine the the results for 4 of them, but field B is causing me some
trouble in understanding what's going on.

The value of field B is X (uppercase).
The analyzer is a SimpleAnalyzer, which I use on the QueryParser as well.
But when I search for X (uppercase) on field B, the X is converted to lowercase.
Now, I know that SimpleAnalyzer converts to lowercase, but I was
expecting it not to do so on field B, which is NOT_ANALYZED.

How should I fix my code?

Thank you in advance!
-John



--- code ---


package test;

import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TopDocCollector;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.queryParser.QueryParser;



public class Test
{
  public static void main(String[] args)
  {
    try
    {
      RAMDirectory idx = new RAMDirectory();
      SimpleAnalyzer analyzer = new SimpleAnalyzer();

      IndexWriter writer = new IndexWriter(idx, analyzer, true,
          IndexWriter.MaxFieldLength.LIMITED);

      Document doc = new Document();
      doc.add(new Field("A", "X",
          Field.Store.YES, Field.Index.NO));
      doc.add(new Field("B", "X",
          Field.Store.YES, Field.Index.NOT_ANALYZED));
      doc.add(new Field("C", "X",
          Field.Store.YES, Field.Index.ANALYZED));
      doc.add(new Field("D", "x",
          Field.Store.NO, Field.Index.NOT_ANALYZED));
      doc.add(new Field("E", "X",
          Field.Store.NO, Field.Index.ANALYZED));
      writer.addDocument(doc);
      writer.close();

      IndexSearcher searcher = new IndexSearcher(idx);
      String field = "B";
      QueryParser parser = new QueryParser(field, analyzer);
      Query query = parser.parse("X");
      System.out.println("Query: " + query.toString());

      TopDocCollector collector = new TopDocCollector(1);
      searcher.search(query, collector);
      int numHits = collector.getTotalHits();
      System.out.println(numHits + " total matching documents");

      if ( numHits > 0)
      {
        ScoreDoc[] hits = collector.topDocs().scoreDocs;
        doc = searcher.doc(hits[0].doc);
        System.out.println("A: " + doc.get("A"));
        System.out.println("B: " + doc.get("B"));
        System.out.println("C: " + doc.get("C"));
        System.out.println("D: " + doc.get("D"));
        System.out.println("E: " + doc.get("E"));
      }
    }
    catch (Exception e)
    {
      System.out.println(" caught a " + e.getClass() + "\n with message: "
          + e.getMessage());
    }
  }

}

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: indexing but not tokenizing

Posted by Ian Lea <ia...@gmail.com>.

Hi


I think that the SimpleAnalyzer you are passing to the query parser
will be downcasing the X.  You can fix it using an analyzer that
doesn't convert to lower case, creating the query directly in code, or
by using PerFieldAnalyzerWrapper, and no doubt other ways too.

If you want a direct suggestion: use PerFieldAnalyzerWrapper,
specifying a different analyzer for field B.


--
Ian.


On Thu, Mar 5, 2009 at 3:17 PM, John Marks <a8...@gmail.com> wrote:
> Hi all,
>
> I'm not able to see what's wrong in the following sample code.
> I'm indexing a document with 5 fields, using five different indexing strategies.
> I'm fine the the results for 4 of them, but field B is causing me some
> trouble in understanding what's going on.
>
> The value of field B is X (uppercase).
> The analyzer is a SimpleAnalyzer, which I use on the QueryParser as well.
> But when I search for X (uppercase) on field B, the X is converted to lowercase.
> Now, I know that SimpleAnalyzer converts to lowercase, but I was
> expecting it not to do so on field B, which is NOT_ANALYZED.
>
> How should I fix my code?
>
> Thank you in advance!
> -John
>
>
>
> --- code ---
>
>
> package test;
>
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.store.RAMDirectory;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.TopDocCollector;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.queryParser.QueryParser;
>
>
>
> public class Test
> {
>   public static void main(String[] args)
>   {
>     try
>     {
>       RAMDirectory idx = new RAMDirectory();
>       SimpleAnalyzer analyzer = new SimpleAnalyzer();
>
>       IndexWriter writer = new IndexWriter(idx, analyzer, true,
>           IndexWriter.MaxFieldLength.LIMITED);
>
>       Document doc = new Document();
>       doc.add(new Field("A", "X",
>           Field.Store.YES, Field.Index.NO));
>       doc.add(new Field("B", "X",
>           Field.Store.YES, Field.Index.NOT_ANALYZED));
>       doc.add(new Field("C", "X",
>           Field.Store.YES, Field.Index.ANALYZED));
>       doc.add(new Field("D", "x",
>           Field.Store.NO, Field.Index.NOT_ANALYZED));
>       doc.add(new Field("E", "X",
>           Field.Store.NO, Field.Index.ANALYZED));
>       writer.addDocument(doc);
>       writer.close();
>
>       IndexSearcher searcher = new IndexSearcher(idx);
>       String field = "B";
>       QueryParser parser = new QueryParser(field, analyzer);
>       Query query = parser.parse("X");
>       System.out.println("Query: " + query.toString());
>
>       TopDocCollector collector = new TopDocCollector(1);
>       searcher.search(query, collector);
>       int numHits = collector.getTotalHits();
>       System.out.println(numHits + " total matching documents");
>
>       if ( numHits > 0)
>       {
>         ScoreDoc[] hits = collector.topDocs().scoreDocs;
>         doc = searcher.doc(hits[0].doc);
>         System.out.println("A: " + doc.get("A"));
>         System.out.println("B: " + doc.get("B"));
>         System.out.println("C: " + doc.get("C"));
>         System.out.println("D: " + doc.get("D"));
>         System.out.println("E: " + doc.get("E"));
>       }
>     }
>     catch (Exception e)
>     {
>       System.out.println(" caught a " + e.getClass() + "\n with message: "
>           + e.getMessage());
>     }
>   }
>
> }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org