You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by tales <ne...@web.de> on 2011/04/24 21:17:24 UTC

General Questions and some Problems

Hello everyone,

my name is Lars. I'm new to Java and especially to Lucene. For your
understanding I tell you in short what I'm about to do:

I'm a student from Germany and currently I'm working at our library as a
student worker. I'm study Bioinformatics/Biosystemengineering and because of
that the leader of our library asked me to help him out with some technical
things on our homepage. We currently only have a MySQL based search script
for our users and now want to add a fulltextsearch to higher the performance
of the searching results for our users. So my idea was it to use Lucene,
what I think is the best way. I don't want to use the Zend framework for
that. I need to program with Java in any way. I want to build a small
program working in the background with a frontend for the users. 

Now what I did until here:

1. I downloaded Lucene 3.1.0 core (I think it's actually the newest?)
2. I searched for a small tutorial on the web to get into the material and
found the page http://www.lucenetutorial.com
3. I copy-pasted the example code from the page to see if I'm able to run
the code exactly

Here is my first problem: I'm not able to compile the code. The problem lies
in the imported packages. I tried to compile the code with the following
commandline:

javac TextFileIndexer.java -classpath
../Lucene/lucene-core-3.1.0/org/apache/lucene

i tried multiple path to the lucene package but the result was ever the
same:


TextFileIndexer.java:3: package org.apache.lucene.analysis.standard does not
exist
import org.apache.lucene.analysis.standard.StandardAnalyzer;
                                          ^
TextFileIndexer.java:4: package org.apache.lucene.document does not exist
import org.apache.lucene.document.Document;
                                 ^
TextFileIndexer.java:5: package org.apache.lucene.document does not exist
import org.apache.lucene.document.Field;
                                 ^
TextFileIndexer.java:6: package org.apache.lucene.index does not exist
import org.apache.lucene.index.IndexWriter;
                              ^
TextFileIndexer.java:17: cannot find symbol
symbol  : class IndexWriter
location: class com.lucenetutorial.apps.TextFileIndexer
	private IndexWriter writer;
	        ^
TextFileIndexer.java:69: cannot find symbol
symbol  : class IndexWriter
location: class com.lucenetutorial.apps.TextFileIndexer
		writer = new IndexWriter(indexDir, new StandardAnalyzer(), true,
IndexWriter.MaxFieldLength.LIMITED); 
		             ^
TextFileIndexer.java:69: cannot find symbol
symbol  : class StandardAnalyzer
location: class com.lucenetutorial.apps.TextFileIndexer
		writer = new IndexWriter(indexDir, new StandardAnalyzer(), true,
IndexWriter.MaxFieldLength.LIMITED); 
		                                       ^
TextFileIndexer.java:69: package IndexWriter does not exist
		writer = new IndexWriter(indexDir, new StandardAnalyzer(), true,
IndexWriter.MaxFieldLength.LIMITED); 
		                                                                           
^
TextFileIndexer.java:89: cannot find symbol
symbol  : class Document
location: class com.lucenetutorial.apps.TextFileIndexer
				Document doc = new Document();
				^
TextFileIndexer.java:89: cannot find symbol
symbol  : class Document
location: class com.lucenetutorial.apps.TextFileIndexer
				Document doc = new Document();
				                   ^
TextFileIndexer.java:95: cannot find symbol
symbol  : class Field
location: class com.lucenetutorial.apps.TextFileIndexer
				doc.add(new Field("contents", fr));
				            ^
TextFileIndexer.java:100: cannot find symbol
symbol  : class Field
location: class com.lucenetutorial.apps.TextFileIndexer
				doc.add(new Field("path", fileName,
				            ^
TextFileIndexer.java:101: package Field does not exist
								  Field.Store.YES,
								       ^
TextFileIndexer.java:102: package Field does not exist
								  Field.Index.NOT_ANALYZED));
								       ^
14 errors


So my questions:

Can you please help me little bit and tell me why I'm not able to compile
the code?
And tell me if Lucene is the best way for that task oder should I use a
Lucene port like Solr?
What files do I need to give with when I wrote an application using Lucene?

Thank you very much for your help.

Best regards

Lars

P.S.:

Here the code of the TextFileIndexer.java:

package com.lucenetutorial.apps;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;

import java.io.*;
import java.util.ArrayList;

/**
 * This terminal application creates an Apache Lucene index in a folder and
adds files into this index
 * based on the input of the user.
 */
public class TextFileIndexer {

  private IndexWriter writer;
  private ArrayList<File> queue = new ArrayList<File>();

  public static void main(String[] args) throws IOException {
    System.out.println("Enter the path where the index will be created: ");

    BufferedReader br = new BufferedReader(
            new InputStreamReader(System.in));
    String s = br.readLine();

    TextFileIndexer indexer = null;
    try {
      indexer = new TextFileIndexer(s);
    } catch (Exception ex) {
      System.out.println("Cannot create index..." + ex.getMessage());
      System.exit(-1);
    }

    //===================================================
    //read input from user until he enters q for quit
    //===================================================
    while (!s.equalsIgnoreCase("q")) {
      try {
        System.out.println("Enter the file or folder name to add into the
index (q=quit):");
        System.out.println("[Acceptable file types: .xml, .html, .html,
.txt]");
        s = br.readLine();
        if (s.equalsIgnoreCase("q")) {
          break;
        }

        //try to add file into the index
        indexer.indexFileOrDirectory(s);
      } catch (Exception e) {
        System.out.println("Error indexing " + s + " : " + e.getMessage());
      }
    }

    //===================================================
    //after adding, we always have to call the
    //closeIndex, otherwise the index is not created    
    //===================================================
    indexer.closeIndex();
  }

  /**
   * Constructor
   * @param indexDir the name of the folder in which the index should be
created
   * @throws java.io.IOException
   */
  TextFileIndexer(String indexDir) throws IOException {
    // the boolean true parameter means to create a new index everytime, 
    // potentially overwriting any existing files there.
    writer = new IndexWriter(indexDir, new StandardAnalyzer(), true,
IndexWriter.MaxFieldLength.LIMITED); 
  }

  /**
   * Indexes a file or directory
   * @param fileName the name of a text file or a folder we wish to add to
the index
   * @throws java.io.IOException
   */
  public void indexFileOrDirectory(String fileName) throws IOException {
    //===================================================
    //gets the list of files in a folder (if user has submitted
    //the name of a folder) or gets a single file name (is user
    //has submitted only the file name) 
    //===================================================
    listFiles(new File(fileName));
    
    int originalNumDocs = writer.numDocs();
    for (File f : queue) {
      FileReader fr = null;
      try {
        Document doc = new Document();

        //===================================================
        // add contents of file
        //===================================================
        fr = new FileReader(f);
        doc.add(new Field("contents", fr));

        //===================================================
        //adding second field which contains the path of the file
        //===================================================
        doc.add(new Field("path", fileName,
                Field.Store.YES,
                Field.Index.NOT_ANALYZED));

        writer.addDocument(doc);
        System.out.println("Added: " + f);
      } catch (Exception e) {
        System.out.println("Could not add: " + f);
      } finally {
        fr.close();
      }
    }
    
    int newNumDocs = writer.numDocs();
    System.out.println("");
    System.out.println("************************");
    System.out.println((newNumDocs - originalNumDocs) + " documents
added.");
    System.out.println("************************");

    queue.clear();
  }

  private void listFiles(File file) {
    if (!file.exists()) {
      System.out.println(file + " does not exist.");
    }
    if (file.isDirectory()) {
      for (File f : file.listFiles()) {
        listFiles(f);
      }
    } else {
      String filename = file.getName().toLowerCase();
      //===================================================
      // Only index text files
      //===================================================
      if (filename.endsWith(".htm") || filename.endsWith(".html") || 
              filename.endsWith(".xml") || filename.endsWith(".txt")) {
        queue.add(file);
      } else {
        System.out.println("Skipped " + filename);
      }
    }
  }

  /**
   * Close the index.
   * @throws java.io.IOException
   */
  public void closeIndex() throws IOException {
    writer.optimize();
    writer.close();
  }
}

--
View this message in context: http://lucene.472066.n3.nabble.com/General-Questions-and-some-Problems-tp2858378p2858378.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: General Questions and some Problems

Posted by Em <ma...@yahoo.de>.
Hi Lars,

in short, without completly reading through your code, I suggest you to
use Solr. Solr is better for beginners like you with little experience
in Lucene and Java and gives you many built-in options, from caches to
facets - out of the box.

Everything you need to use Solr is a Servlet-container (I prefer Tomcat
over Jetty for production, but some people even say that Jetty is okay)
and that's it.

Regards,
Em

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: General Questions and some Problems

Posted by tales <ne...@web.de>.
test

--
View this message in context: http://lucene.472066.n3.nabble.com/General-Questions-and-some-Problems-tp2858378p2858957.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: General Questions and some Problems

Posted by tales <ne...@web.de>.
bump

--
View this message in context: http://lucene.472066.n3.nabble.com/General-Questions-and-some-Problems-tp2858378p2858950.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: General Questions and some Problems

Posted by Simon Willnauer <si...@googlemail.com>.
I briefly read through and it seems that the classpath is wrong, it should be:

javac TextFileIndexer.java -classpath
../Lucene/lucene-core-3.1.0/

provided this is the directory including org/apache/lucene/*.class | **/*.class

simon

On Sun, Apr 24, 2011 at 9:17 PM, tales <ne...@web.de> wrote:
> Hello everyone,
>
> my name is Lars. I'm new to Java and especially to Lucene. For your
> understanding I tell you in short what I'm about to do:
>
> I'm a student from Germany and currently I'm working at our library as a
> student worker. I'm study Bioinformatics/Biosystemengineering and because of
> that the leader of our library asked me to help him out with some technical
> things on our homepage. We currently only have a MySQL based search script
> for our users and now want to add a fulltextsearch to higher the performance
> of the searching results for our users. So my idea was it to use Lucene,
> what I think is the best way. I don't want to use the Zend framework for
> that. I need to program with Java in any way. I want to build a small
> program working in the background with a frontend for the users.
>
> Now what I did until here:
>
> 1. I downloaded Lucene 3.1.0 core (I think it's actually the newest?)
> 2. I searched for a small tutorial on the web to get into the material and
> found the page http://www.lucenetutorial.com
> 3. I copy-pasted the example code from the page to see if I'm able to run
> the code exactly
>
> Here is my first problem: I'm not able to compile the code. The problem lies
> in the imported packages. I tried to compile the code with the following
> commandline:
>
> javac TextFileIndexer.java -classpath
> ../Lucene/lucene-core-3.1.0/org/apache/lucene
>
> i tried multiple path to the lucene package but the result was ever the
> same:
>
>
> TextFileIndexer.java:3: package org.apache.lucene.analysis.standard does not
> exist
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>                                          ^
> TextFileIndexer.java:4: package org.apache.lucene.document does not exist
> import org.apache.lucene.document.Document;
>                                 ^
> TextFileIndexer.java:5: package org.apache.lucene.document does not exist
> import org.apache.lucene.document.Field;
>                                 ^
> TextFileIndexer.java:6: package org.apache.lucene.index does not exist
> import org.apache.lucene.index.IndexWriter;
>                              ^
> TextFileIndexer.java:17: cannot find symbol
> symbol  : class IndexWriter
> location: class com.lucenetutorial.apps.TextFileIndexer
>        private IndexWriter writer;
>                ^
> TextFileIndexer.java:69: cannot find symbol
> symbol  : class IndexWriter
> location: class com.lucenetutorial.apps.TextFileIndexer
>                writer = new IndexWriter(indexDir, new StandardAnalyzer(), true,
> IndexWriter.MaxFieldLength.LIMITED);
>                             ^
> TextFileIndexer.java:69: cannot find symbol
> symbol  : class StandardAnalyzer
> location: class com.lucenetutorial.apps.TextFileIndexer
>                writer = new IndexWriter(indexDir, new StandardAnalyzer(), true,
> IndexWriter.MaxFieldLength.LIMITED);
>                                                       ^
> TextFileIndexer.java:69: package IndexWriter does not exist
>                writer = new IndexWriter(indexDir, new StandardAnalyzer(), true,
> IndexWriter.MaxFieldLength.LIMITED);
>
> ^
> TextFileIndexer.java:89: cannot find symbol
> symbol  : class Document
> location: class com.lucenetutorial.apps.TextFileIndexer
>                                Document doc = new Document();
>                                ^
> TextFileIndexer.java:89: cannot find symbol
> symbol  : class Document
> location: class com.lucenetutorial.apps.TextFileIndexer
>                                Document doc = new Document();
>                                                   ^
> TextFileIndexer.java:95: cannot find symbol
> symbol  : class Field
> location: class com.lucenetutorial.apps.TextFileIndexer
>                                doc.add(new Field("contents", fr));
>                                            ^
> TextFileIndexer.java:100: cannot find symbol
> symbol  : class Field
> location: class com.lucenetutorial.apps.TextFileIndexer
>                                doc.add(new Field("path", fileName,
>                                            ^
> TextFileIndexer.java:101: package Field does not exist
>                                                                  Field.Store.YES,
>                                                                       ^
> TextFileIndexer.java:102: package Field does not exist
>                                                                  Field.Index.NOT_ANALYZED));
>                                                                       ^
> 14 errors
>
>
> So my questions:
>
> Can you please help me little bit and tell me why I'm not able to compile
> the code?
> And tell me if Lucene is the best way for that task oder should I use a
> Lucene port like Solr?
> What files do I need to give with when I wrote an application using Lucene?
>
> Thank you very much for your help.
>
> Best regards
>
> Lars
>
> P.S.:
>
> Here the code of the TextFileIndexer.java:
>
> package com.lucenetutorial.apps;
>
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.index.IndexWriter;
>
> import java.io.*;
> import java.util.ArrayList;
>
> /**
>  * This terminal application creates an Apache Lucene index in a folder and
> adds files into this index
>  * based on the input of the user.
>  */
> public class TextFileIndexer {
>
>  private IndexWriter writer;
>  private ArrayList<File> queue = new ArrayList<File>();
>
>  public static void main(String[] args) throws IOException {
>    System.out.println("Enter the path where the index will be created: ");
>
>    BufferedReader br = new BufferedReader(
>            new InputStreamReader(System.in));
>    String s = br.readLine();
>
>    TextFileIndexer indexer = null;
>    try {
>      indexer = new TextFileIndexer(s);
>    } catch (Exception ex) {
>      System.out.println("Cannot create index..." + ex.getMessage());
>      System.exit(-1);
>    }
>
>    //===================================================
>    //read input from user until he enters q for quit
>    //===================================================
>    while (!s.equalsIgnoreCase("q")) {
>      try {
>        System.out.println("Enter the file or folder name to add into the
> index (q=quit):");
>        System.out.println("[Acceptable file types: .xml, .html, .html,
> .txt]");
>        s = br.readLine();
>        if (s.equalsIgnoreCase("q")) {
>          break;
>        }
>
>        //try to add file into the index
>        indexer.indexFileOrDirectory(s);
>      } catch (Exception e) {
>        System.out.println("Error indexing " + s + " : " + e.getMessage());
>      }
>    }
>
>    //===================================================
>    //after adding, we always have to call the
>    //closeIndex, otherwise the index is not created
>    //===================================================
>    indexer.closeIndex();
>  }
>
>  /**
>   * Constructor
>   * @param indexDir the name of the folder in which the index should be
> created
>   * @throws java.io.IOException
>   */
>  TextFileIndexer(String indexDir) throws IOException {
>    // the boolean true parameter means to create a new index everytime,
>    // potentially overwriting any existing files there.
>    writer = new IndexWriter(indexDir, new StandardAnalyzer(), true,
> IndexWriter.MaxFieldLength.LIMITED);
>  }
>
>  /**
>   * Indexes a file or directory
>   * @param fileName the name of a text file or a folder we wish to add to
> the index
>   * @throws java.io.IOException
>   */
>  public void indexFileOrDirectory(String fileName) throws IOException {
>    //===================================================
>    //gets the list of files in a folder (if user has submitted
>    //the name of a folder) or gets a single file name (is user
>    //has submitted only the file name)
>    //===================================================
>    listFiles(new File(fileName));
>
>    int originalNumDocs = writer.numDocs();
>    for (File f : queue) {
>      FileReader fr = null;
>      try {
>        Document doc = new Document();
>
>        //===================================================
>        // add contents of file
>        //===================================================
>        fr = new FileReader(f);
>        doc.add(new Field("contents", fr));
>
>        //===================================================
>        //adding second field which contains the path of the file
>        //===================================================
>        doc.add(new Field("path", fileName,
>                Field.Store.YES,
>                Field.Index.NOT_ANALYZED));
>
>        writer.addDocument(doc);
>        System.out.println("Added: " + f);
>      } catch (Exception e) {
>        System.out.println("Could not add: " + f);
>      } finally {
>        fr.close();
>      }
>    }
>
>    int newNumDocs = writer.numDocs();
>    System.out.println("");
>    System.out.println("************************");
>    System.out.println((newNumDocs - originalNumDocs) + " documents
> added.");
>    System.out.println("************************");
>
>    queue.clear();
>  }
>
>  private void listFiles(File file) {
>    if (!file.exists()) {
>      System.out.println(file + " does not exist.");
>    }
>    if (file.isDirectory()) {
>      for (File f : file.listFiles()) {
>        listFiles(f);
>      }
>    } else {
>      String filename = file.getName().toLowerCase();
>      //===================================================
>      // Only index text files
>      //===================================================
>      if (filename.endsWith(".htm") || filename.endsWith(".html") ||
>              filename.endsWith(".xml") || filename.endsWith(".txt")) {
>        queue.add(file);
>      } else {
>        System.out.println("Skipped " + filename);
>      }
>    }
>  }
>
>  /**
>   * Close the index.
>   * @throws java.io.IOException
>   */
>  public void closeIndex() throws IOException {
>    writer.optimize();
>    writer.close();
>  }
> }
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/General-Questions-and-some-Problems-tp2858378p2858378.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org