You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by lucenewbie <br...@gmail.com> on 2011/09/29 09:31:41 UTC

Extreme Beginner Needs Lucene Help

I'm trying to write a Lucene program to index a large .txt file. 

Really, it should be extremely basic, I just want to learn how to use Lucene
but I'm getting all sorts of strange errors when I try simple lines of code. 

Here's an example: 

		StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
		boolean recreateIndexIfExists = true;
		IndexWriter indexWriter = new IndexWriter("/indexDirectory", analyzer,
recreateIndexIfExists);

>From what I read in the documentation and the official Lucene Book, there
should be nothing wrong with that. The error I get most often is "the
IndexWriter(String, StandardAnalyzer, boolean) constructor is not defined.


So I'm going to guess the problem lies elsewhere. 

Here's a confession: I've never used any external jars when writing java
programs before. Nor have I ever needed to edit my Path or Classpath
variables. So I'm going to run through a list of things I've done to try to
"connect" Lucene, hopefully they'll weed out any suggestions you might have.
But first:

-I am using Eclipse though I tried to run the Indexer.java program
downloaded from manning's site in Textpad and the problem persisted, with
the same flavor of error. 
-I am on a Windows OS
-I am using java version 1.7
-Lucene version 3.4.0

Here are the things I've done so far:

1) Added the Lucene core and lucene demo jars to my system classpath. (While
I'm reasonably certain I did this correctly, I'm at the point where I just
feel like I've made a really stupid mistake and don't want to leave anything
out, so here is what my CLASSPATH looks like now):

.;C:\Program Files
(x86)\Java\jre6\lib\ext\QTJava.zip;C:\Users\Nathan\Documents\School\Lucene\lucene-3.4.0\lucene-core-3.4.0.jar;C:\Users\Nathan\Documents\School\Lucene\lucene-3.4.0\contrib\demo\lucene-demo-3.4.0.jar;

(unrelated question: I'm using java 1.7 in my PATH variable. I don't know
why that reference to jre6 is still there but I didn't want to remove it.
The same folder in jre7 doesn't contain QTJava.zip so I figured it couldn't
be harmful. Anyone know what I should do there?)

2) (Somewhat related to the unrelated question:) I added both of the .jars
to the \lib\ext folders of both the jre6 folder referenced in my path and
the jre7 folder that presumably should be. I read somewhere this is
essentially the same as adding to the CLASSPATH and I did so when I was
having trouble figuring out just wtf I was doing (which I still am).

3) I configured the Build Path for my java project to include all relevant
jars (the core and demo jars as well as their javadoc jars)

4) I have import statements out the wazoo. I mention this because it took me
a while to figure out that I needed them because the book doesn't include
them nor does it even mention their existence. Remember, this is my first
time working with external code like this. I'm sure I seem fully retarded
but I guess I thought that referencing the jars gave me everything I needed
out of the box. 

5) I found this  http://jacobian.web.id/2010/08/09/how-to-use-lucene-part-1/
website  and copied and pasted the code to see if it would work and it works
flawlessly. It's simply not useful because it creates the index in memory,
manually adds documents, and prints results. Since my biggest problem right
now is the index directory and referencing the file to be indexed, that
flawless program just doesn't help me. 

So I'm really just at a loss. I've found several other websites with some
sample "getting started" programs but they all give me various amounts of
similarly confusing errors. 

This 
http://www.avajava.com/tutorials/lessons/how-do-i-use-lucene-to-index-and-search-text-files.html?page=1
site  was particularly useful as it had pics of what my project should look
like in Eclipse. 

Thank you for any help you can offer. I'm sorry if it hurts your head to see
something as stupid as this. I promise my head is hurting right now, too. 




Oh and lastly, since I don't seem to have any shame, here's another probably
newbish question:

When Eclipse tells me that something is 'deprecated' (e.g. The Field
Version.LUCENE_CURRENT is deprecated), what does that mean? It's just a
warning and Eclipse suggests that I just Suppress Deprecation Warnings as a
quick fix but I'd rather know what's up. Some of the sample code I tested
had a StandardAnalyzer with nothing being passed and those gave me errors. I
had to add the Version.LUCENE_CURRENT to fix them. Additionally, the java
documentation for Lucene is riddled with deprecation this and deprecation
that. If I had to guess, I would say it's like doing something a quick and
dirty way that's kind of frowned upon. 

Thanks again



--
View this message in context: http://lucene.472066.n3.nabble.com/Extreme-Beginner-Needs-Lucene-Help-tp3378560p3378560.html
Sent from the Lucene - General mailing list archive at Nabble.com.

RE: Extreme Beginner Needs Lucene Help

Posted by lucenewbie <br...@gmail.com>.
First, thanks for your reply. A few quick notes:

- I have the book Lucene in Action. Additionally, I have all the source code
for the book. They don't really address my problems.  
- I already configured the build path in Eclipse. 
- I was pretty sure that Lucene 3.4.0 addressed many of the Java 7 issues. 

Thanks for the explanation on deprecation, very informative.

Here is the Indexer.java program given in the source for Lucene in Action. I
edited this to run from a JAVA IDE instead of command line (but that only
meant changing 2 files and removing the argument accepting). I'll mark
errors with *****!!!******errors******!!!*****




import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;

import java.io.File;
import java.io.IOException;
import java.io.FileReader;
import java.util.Date;

/**
 * This code was originally written for
 * Erik's Lucene intro java.net article
 */
public class Indexer {

  public static void main(String[] args) throws Exception {

    File indexDir = new File("indexDirectory");
    File dataDir = new File("filestobeindexed");

    long start = new Date().getTime();
    int numIndexed = index(indexDir, dataDir);
    long end = new Date().getTime();

    System.out.println("Indexing " + numIndexed + " files took "
      + (end - start) + " milliseconds");
  }

  public static int index(File indexDir, File dataDir)
    throws IOException {

    if (!dataDir.exists() || !dataDir.isDirectory()) {
      throw new IOException(dataDir
        + " does not exist or is not a directory");
    }

   IndexWriter writer = new IndexWriter("/index", new
StandardAnalyzer(Version.LUCENE_CURRENT), true,
IndexWriter.MaxFieldLength.LIMITED); 

*****!!!******The constructor IndexWriter(String, StandardAnalyzer, boolean,
IndexWriter.MaxFieldLength) is undefined, MaxFieldLength.LIMITED is
deprecated, MaxFieldLength is deprecated, Version cannot be resolved to a
variable.******!!!*****

    writer.setUseCompoundFile(false);

    indexDirectory(writer, dataDir);

    int numIndexed = writer.docCount(); 

*****!!!******The method docCount() is undefined for the type
IndexWriter******!!!*****

    writer.optimize();
    writer.close();
    return numIndexed;
  }

  private static void indexDirectory(IndexWriter writer, File dir)
    throws IOException {

    File[] files = dir.listFiles();

    for (int i = 0; i < files.length; i++) {
      File f = files[i];
      if (f.isDirectory()) {
        indexDirectory(writer, f);  // recurse
      } else if (f.getName().endsWith(".txt")) {
        indexFile(writer, f);
      }
    }
  }

  private static void indexFile(IndexWriter writer, File f)
    throws IOException {

    if (f.isHidden() || !f.exists() || !f.canRead()) {
      return;
    }

    System.out.println("Indexing " + f.getCanonicalPath());

    Document doc = new Document();
    doc.add(Field.Text("contents", new FileReader(f)));

*****!!!******The method Text(String, FileReader) is undefined for the type
Feild******!!!*****

    doc.add(Field.Keyword("filename", f.getCanonicalPath()));


*****!!!******The method Text(String, String) is undefined for the type
Feild******!!!*****

    writer.addDocument(doc);
  }
}

--
View this message in context: http://lucene.472066.n3.nabble.com/Extreme-Beginner-Needs-Lucene-Help-tp3378560p3380039.html
Sent from the Lucene - General mailing list archive at Nabble.com.

RE: Extreme Beginner Needs Lucene Help

Posted by "Sendros, Jason" <Ja...@VerizonWireless.com>.
Looks like you're trying to write code using old Lucene syntax with the
newest available jar.

Your options are to learn new Lucene:
http://lucene.apache.org/java/3_4_0/api/all/overview-summary.html
Or to use an older version of Lucene that supports the things you're
trying to do: http://archive.apache.org/dist/lucene/java/2.9.4/

Try sticking with Java 6 for now. You will avoid plenty of headaches!

I might suggest using the older version for now since it seems your
tutorials and learning guides are using these older versions. Once you
learn enough about Lucene, you can migrate your code to a newer version
of Lucene.

Jason


-----Original Message-----
From: lucenewbie [mailto:brenkelly+lucene@gmail.com] 
Sent: Thursday, September 29, 2011 2:01 PM
To: general@lucene.apache.org
Subject: RE: Extreme Beginner Needs Lucene Help

First, thanks for your reply. A few quick notes:

- I have the book Lucene in Action. Additionally, I have all the source
code
for the book. They don't really address my problems.  
- I already configured the build path in Eclipse. 
- I was pretty sure that Lucene 3.4.0 addressed many of the Java 7
issues. 

Thanks for the explanation on deprecation, very informative.

Here is the Indexer.java program given in the source for Lucene in
Action. I
edited this to run from a JAVA IDE instead of command line (but that
only
meant changing 2 files and removing the argument accepting). I'll mark
errors with *****!!!******errors******!!!*****




import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;

import java.io.File;
import java.io.IOException;
import java.io.FileReader;
import java.util.Date;

/**
 * This code was originally written for
 * Erik's Lucene intro java.net article
 */
public class Indexer {

  public static void main(String[] args) throws Exception {

    File indexDir = new File("indexDirectory");
    File dataDir = new File("filestobeindexed");

    long start = new Date().getTime();
    int numIndexed = index(indexDir, dataDir);
    long end = new Date().getTime();

    System.out.println("Indexing " + numIndexed + " files took "
      + (end - start) + " milliseconds");
  }

  public static int index(File indexDir, File dataDir)
    throws IOException {

    if (!dataDir.exists() || !dataDir.isDirectory()) {
      throw new IOException(dataDir
        + " does not exist or is not a directory");
    }

   IndexWriter writer = new IndexWriter("/index", new
StandardAnalyzer(Version.LUCENE_CURRENT), true,
IndexWriter.MaxFieldLength.LIMITED); 

*****!!!******The constructor IndexWriter(String, StandardAnalyzer,
boolean,
IndexWriter.MaxFieldLength) is undefined, MaxFieldLength.LIMITED is
deprecated, MaxFieldLength is deprecated, Version cannot be resolved to
a
variable.******!!!*****

    writer.setUseCompoundFile(false);

    indexDirectory(writer, dataDir);

    int numIndexed = writer.docCount(); 

*****!!!******The method docCount() is undefined for the type
IndexWriter******!!!*****

    writer.optimize();
    writer.close();
    return numIndexed;
  }

  private static void indexDirectory(IndexWriter writer, File dir)
    throws IOException {

    File[] files = dir.listFiles();

    for (int i = 0; i < files.length; i++) {
      File f = files[i];
      if (f.isDirectory()) {
        indexDirectory(writer, f);  // recurse
      } else if (f.getName().endsWith(".txt")) {
        indexFile(writer, f);
      }
    }
  }

  private static void indexFile(IndexWriter writer, File f)
    throws IOException {

    if (f.isHidden() || !f.exists() || !f.canRead()) {
      return;
    }

    System.out.println("Indexing " + f.getCanonicalPath());

    Document doc = new Document();
    doc.add(Field.Text("contents", new FileReader(f)));

*****!!!******The method Text(String, FileReader) is undefined for the
type
Feild******!!!*****

    doc.add(Field.Keyword("filename", f.getCanonicalPath()));


*****!!!******The method Text(String, String) is undefined for the type
Feild******!!!*****

    writer.addDocument(doc);
  }
}

--
View this message in context:
http://lucene.472066.n3.nabble.com/Extreme-Beginner-Needs-Lucene-Help-tp
3378560p3380039.html
Sent from the Lucene - General mailing list archive at Nabble.com.