You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Luke Shannon <ls...@hypermedia.com> on 2004/11/12 00:39:58 UTC

Lucene : avoiding locking

Thank you for the tip Craig. I am new to Lucene, user support groups and not
the most experienced programmer. To be honest I am starting to feel a little
over my head with this project.

My company has a content management product. Each time someone changes the
directory structure or a file with in it that portion of the site needs to
be re-indexed so the changes are reflected in future searches (indexing must
happen during run time).

I have written a Indexer class with a static Index() method. The idea is too
call the method every time something changes and the index needs to be
re-examined. I am hoping the logic put in by Doug Cutting surrounding the
UID will make indexing efficient enough to be called so frequently.

This class works great when I test it on my own little site (I have about
2000 file). But when I drop the functionality into the QA environment I get
a locking error.

I can't access the stack trace, all I can get at is a log file the
application writes too. Here is the section my class wrote. It was right in
the middle of indexing and bang lock issue.

I don't know if the problem is in my code or something in the existing
application.

Error Message:
ENTER|SearchEventProcessor.visit(ContentNodeDeleteEvent)
|INFO|INDEXING INFO: Start Indexing new content.
|INFO|INDEXING INFO: Index Folder Did Not Exist. Start Creation Of New Index
|INFO|INDEXING INFO: Beginnging Incremental update comparisions
|INFO|INDEXING INFO: Beginnging Incremental update comparisions
|INFO|INDEXING INFO: Beginnging Incremental update comparisions
|INFO|INDEXING INFO: Beginnging Incremental update comparisions
|INFO|INDEXING INFO: Beginnging Incremental update comparisions
|INFO|INDEXING INFO: Beginnging Incremental update comparisions
|INFO|INDEXING INFO: Beginnging Incremental update comparisions
|INFO|INDEXING INFO: Beginnging Incremental update comparisions
|INFO|INDEXING INFO: Beginnging Incremental update comparisions
|INFO|INDEXING INFO: Beginnging Incremental update comparisions
|INFO|INDEXING INFO: Beginnging Incremental update comparisions
|INFO|INDEXING INFO: Beginnging Incremental update comparisions
|INFO|INDEXING INFO: Beginnging Incremental update comparisions
|INFO|INDEXING INFO: Beginnging Incremental update comparisions
|INFO|INDEXING INFO: Beginnging Incremental update comparisions
|INFO|INDEXING INFO: Beginnging Incremental update comparisions
|INFO|INDEXING INFO: Beginnging Incremental update comparisions
|INFO|INDEXING ERROR: Unable to index new content Lock obtain timed out:
Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432
10f7fe8-write.lock
|ENTER|UpdateCacheEventProcessor.visit(ContentNodeDeleteEvent)

Here is my code. You will recognize it pretty much as the IndexHTML class
from the Lucene demo written by Doug Cutting. I have put a ton of comments
in a attempt to understand what is going on.

Any help would be appreciated.

Luke

package com.fbhm.bolt.search;

/*
 * Created on Nov 11, 2004
 *
 * This class will create a single index file for the Content
 * Management System (CMS). It contains logic to ensure
 * indexing is done "intelligently". Based on IndexHTML.java
 * from the demo folder that ships with Lucene
 */

import java.io.File;
import java.io.IOException;
import java.util.Arrays;
import java.util.Date;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermEnum;
import org.pdfbox.searchengine.lucene.LucenePDFDocument;
import org.apache.lucene.demo.HTMLDocument;

import com.alaia.common.debug.Trace;
import com.alaia.common.util.AppProperties;

/**
 * @author lshannon Description: <br>
 *         This class is used to index a content folder. It contains logic
to
 *         ensure only new or documents that have been modified since the
last
 *         search are indexed. <br>
 *         Based on code writen by Doug Cutting in the IndexHTML class found
in
 *         the Lucene demo
 */
public class Indexer {
 //true during deletion pass, this is when the index already exists
 private static boolean deleting = false;

 //object to read existing indexes
 private static IndexReader reader;

 //object to write to the index folder
 private static IndexWriter writer;

 //this will be used to write the index file
 private static TermEnum uidIter;

 /*
  * This static method does all the work, the end result is an up-to-date
index folder
 */
 public static void Index() {
  //we will assume to start the index has been created
  boolean create = true;
  //set the name of the index file
  String indexFileLocation =
AppProperties.getPropertyAsString("bolt.search.siteIndex.index.root");
  //set the name of the content folder
  String contentFolderLocation =
AppProperties.getPropertyAsString("site.root");
  //manage whether the index needs to be created or not
  File index = new File(indexFileLocation);
  File root = new File(contentFolderLocation);
  //the index file indicated exists, we need an incremental update of the
  // index
  if (index.exists()) {
   Trace.TRACE("INDEXING INFO: An index folder exists at: " +
indexFileLocation);
   deleting = true;
   create = false;
   try {
    //this version of index docs is able to execute the incremental
    // update
    indexDocs(root, indexFileLocation, create);
   } catch (Exception e) {
    //we were unable to do the incremental update
    Trace.TRACE("INDEXING ERROR: Unable to execute incremental update "
        + e.getMessage());
   }
   //after exiting this loop the index should be current with content
   Trace.TRACE("INDEXING INFO: Incremental update completed.");
  }
  try {
   //create the writer
   writer = new IndexWriter(index, new StandardAnalyzer(), create);
   //configure the writer
   writer.mergeFactor = 10000;
   writer.maxFieldLength = 100000;
   try {
    //get the start date
    Date start = new Date();
    //call the indexDocs method, this time we will add new
    // documents
    Trace.TRACE("INDEXING INFO: Start Indexing new content.");
    indexDocs(root, indexFileLocation, create);
    Trace.TRACE("INDEXING INFO: Indexing new content complete.");
    //optimize the index
    writer.optimize();
    //close the writer
    writer.close();
    //get the end date
    Date end = new Date();
    long totalTime = end.getTime() - start.getTime();
    Trace.TRACE("INDEXING INFO: All Indexing Operations Completed in "
        + totalTime + " milliseconds");
   } catch (Exception e1) {
    //unable to add new documents
    Trace.TRACE("INDEXING ERROR: Unable to index new content "
        + e1.getMessage());
   }
  } catch (IOException e) {
   Trace.TRACE("INDEXING ERROR: Unable to create IndexWriter "
     + e.getMessage());
  }
 }

 /*
  * Walk directory hierarchy in uid order, while keeping uid iterator from
/*
  * existing index in sync. Mismatches indicate one of: (a) old documents to
/*
  * be deleted; (b) unchanged documents, to be left alone; or (c) new /*
  * documents, to be indexed.
  */

 private static void indexDocs(File file, String index, boolean create)
   throws Exception {
  //the index already exists we do an incremental update
  if (!create) {
   Trace.TRACE("INDEXING INFO: Incremental Update Request Confirmed");
   //open existing index
   reader = IndexReader.open(index);
   //this gets an enummeration of uid terms
   uidIter = reader.terms(new Term("uid", ""));
   //jump to the index method that does the work
   //this will use the Iteration above and does
   //all the "smart" indexing
   indexDocs(file);
   //this will be true everytime the index already existed
   //we are not going to delete documents that are old
   if (deleting) {
    Trace.TRACE("INDEXING INFO: Deleting Old Content Phase Started. All
Deleted Docs will be listed.");
    while (uidIter.term() != null
      && uidIter.term().field() == "uid") {
     //basically we are deleting all the document we have
     // indexed before
     Trace.TRACE("INDEXING INFO: Deleting document "
       + HTMLDocument.uid2url(uidIter.term().text()));
     //delete the term from the reader
     reader.delete(uidIter.term());
     //go to the nextfield
     uidIter.next();
    }
    Trace.TRACE("INDEXING INFO: Deleting Old Content Phase Completed");
    //turn off the deleting flag
    deleting = false;
   }//close the deleting branch
   //close the enummeration
   uidIter.close(); // close uid iterator
   //close the reader
   reader.close(); // close existing index

  }
  //we go here is the index already existed
  else {
   Trace.TRACE("INDEXING INFO: Index Folder Did Not Exist. Start Creation Of
New Index");
   // don't have exisiting
   indexDocs(file);
  }
 }

 private static void indexDocs(File file) throws Exception {
  //check if we are at the top of a directory
  if (file.isDirectory()) {
   //get a list of the files
   String[] files = file.list();
   //sort them
   Arrays.sort(files);
   //index each file in the directory recursively
   //we keep repeating this logic until we hit a
   //file
   for (int i = 0; i < files.length; i++)
    //pass in the parent directory and the current file
    //into the file constructor and index
    indexDocs(new File(file, files[i]));

  }
  //we have an actual file, so we need to consider the
  //file extensions so the correct Document is created
  else if (file.getPath().endsWith(".html")
    || file.getPath().endsWith(".htm")
    || file.getPath().endsWith(".txt")
    || file.getPath().endsWith(".doc")
    || file.getPath().endsWith(".xml")
    || file.getPath().endsWith(".pdf")) {

   //if this is reached it means we were in the midst
   //of an incremental update
   if (uidIter != null) {
    //get the uid for the document we are on
    String uid = HTMLDocument.uid(file);
    //now compare this document to the one we have in the
    //enummeration of terms.
    //if the term in the enummeration is less than the
    //term we are on it must be deleted (if we are indeed
    //doing an incrementatal update)
    Trace.TRACE("INDEXING INFO: Beginnging Incremental update
comparisions");
    while (uidIter.term() != null
      && uidIter.term().field() == "uid"
      && uidIter.term().text().compareTo(uid) < 0) {
     //delete stale docs
     if (deleting) {
      reader.delete(uidIter.term());
     }
     uidIter.next();
    }
    //if the terms are equal there is no change with this document
    //we keep it as is
    if (uidIter.term() != null && uidIter.term().field() == "uid"
      && uidIter.term().text().compareTo(uid) == 0) {
     uidIter.next();
    }
    //if we are not deleting and the document was not there
    //it means we didn't have this document on the last index
    //and we should add it
    else if (!deleting) {
     if (file.getPath().endsWith(".pdf")) {
      Document doc = LucenePDFDocument.getDocument(file);
      Trace.TRACE("INDEXING INFO: Adding new document to the existing index:
"
          + doc.get("url"));
      writer.addDocument(doc);
     } else if (file.getPath().endsWith(".xml")) {
      Document doc = XMLDocument.Document(file);
      Trace.TRACE("INDEXING INFO: Adding new document to the existing index:
"
          + doc.get("url"));
      writer.addDocument(doc);
     } else {
      Document doc = HTMLDocument.Document(file);
      Trace.TRACE("INDEXING INFO: Adding new document to the existing index:
"
          + doc.get("url"));
      writer.addDocument(doc);
     }
    }
   }//end the if for an incremental update
   //we are creating a new index, add all document types
   else {
    if (file.getPath().endsWith(".pdf")) {
     Document doc = LucenePDFDocument.getDocument(file);
     Trace.TRACE("INDEXING INFO: Adding a new document to the new index: "
         + doc.get("url"));
     writer.addDocument(doc);
    } else if (file.getPath().endsWith(".xml")) {
     Document doc = XMLDocument.Document(file);
     Trace.TRACE("INDEXING INFO: Adding a new document to the new index: "
         + doc.get("url"));
     writer.addDocument(doc);
    } else {
     Document doc = HTMLDocument.Document(file);
     Trace.TRACE("INDEXING INFO: Adding a new document to the new index: "
         + doc.get("url"));
     writer.addDocument(doc);
    }//close the else
   }//close the else for a new index
  }//close the else if to handle file types
 }//close the indexDocs method

}


----- Original Message ----- 
From: "Craig McClanahan" <cr...@gmail.com>
To: "Jakarta Commons Users List" <co...@jakarta.apache.org>
Sent: Thursday, November 11, 2004 6:13 PM
Subject: Re: avoiding locking


> In order to get any useful help, it would be nice to know what you are
> trying to do, and (most importantly) what commons component is giving
> you the problem :-).  The traditional approach is to put a prefix on
> your subject line -- for commons package "foo" it would be:
>
>   [foo] avoiding locking
>
> It's also generally helpful to see the entire stack trace, not just
> the exception message itself.
>
> Craig
>
>
> On Thu, 11 Nov 2004 17:27:19 -0500, Luke Shannon
> <ls...@hypermedia.com> wrote:
> > What can I do to avoid locking issues?
> >
> > Unable to execute incremental update Lock obtain timed out:
Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432
10f7fe8-write.lock
> >
> > Thanks,
> >
> > Luke
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: Lucene : avoiding locking

Posted by Luke Shannon <ls...@hypermedia.com>.
Thank you.

Sorry all I posted this on the wrong list. Please disregard the issue. I'm
on the Lucene list now and have got my problem resolved.

----- Original Message ----- 
From: "Marcus Beyer" <mb...@stormlight.de>
To: "Luke Shannon" <ls...@hypermedia.com>
Cc: "Jakarta Commons Users List" <co...@jakarta.apache.org>
Sent: Friday, November 12, 2004 5:58 PM
Subject: Re: Lucene : avoiding locking



Am 12.11.2004 um 00:39 schrieb Luke Shannon:

> Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene
> -398fbd170a5457d05e2f4d432
> 10f7fe8-write.lock
> |ENTER|UpdateCacheEventProcessor.visit(ContentNodeDeleteEvent)

AFAIR the default lock obtain time out is quite small (1 second?).
There is a parameter you can give the vm on start. I don't rember the
name.
btw: Craig is right. This list is not about Lucene ...

Grüße,
Marcus

<http://www.Stormlight.de>

"Der Transhumanismus hat das Streben
  nach Transzendenz vom Kopf auf die Füße gestellt." -- Torsten Nahm




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: Lucene : avoiding locking

Posted by Marcus Beyer <mb...@stormlight.de>.
Am 12.11.2004 um 00:39 schrieb Luke Shannon:

> Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene 
> -398fbd170a5457d05e2f4d432
> 10f7fe8-write.lock
> |ENTER|UpdateCacheEventProcessor.visit(ContentNodeDeleteEvent)

AFAIR the default lock obtain time out is quite small (1 second?).
There is a parameter you can give the vm on start. I don't rember the  
name.
btw: Craig is right. This list is not about Lucene ...

Grüße,
Marcus

<http://www.Stormlight.de>

"Der Transhumanismus hat das Streben
  nach Transzendenz vom Kopf auf die Füße gestellt." -- Torsten Nahm


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: Lucene : avoiding locking

Posted by Craig McClanahan <cr...@gmail.com>.
On Thu, 11 Nov 2004 18:39:58 -0500, Luke Shannon
<ls...@hypermedia.com> wrote:
> Thank you for the tip Craig. I am new to Lucene, user support groups and not
> the most experienced programmer. To be honest I am starting to feel a little
> over my head with this project.
> 

I know how you feel ... it can be overwhelming when you first enter
the wide and wonderful world of open source ... :-)

As it happens, each of the subprojects at Jakarta (including Lucene)
has their own mailing lists for users and developers.  This particular
list is for the Jakarta Commons packages, which is a bunch of small
reusable libraries.  I've cc'd this message to the user list for
Lucene -- it will need to be approved by a moderator before it's
posted -- where you should be able to find people that know a lot more
about Lucene than I do.

You'll likely want to subscribe to the Lucene User list yourself in
order to ask and answer direct questions about Lucene.  To do so, send
an empty message to <lu...@apache.org>.  Info about
all the mailing lists at Jakarta can be found at:

  http://jakarta.apache.org/site/mail.html

Good luck!

Craig


> My company has a content management product. Each time someone changes the
> directory structure or a file with in it that portion of the site needs to
> be re-indexed so the changes are reflected in future searches (indexing must
> happen during run time).
> 
> I have written a Indexer class with a static Index() method. The idea is too
> call the method every time something changes and the index needs to be
> re-examined. I am hoping the logic put in by Doug Cutting surrounding the
> UID will make indexing efficient enough to be called so frequently.
> 
> This class works great when I test it on my own little site (I have about
> 2000 file). But when I drop the functionality into the QA environment I get
> a locking error.
> 
> I can't access the stack trace, all I can get at is a log file the
> application writes too. Here is the section my class wrote. It was right in
> the middle of indexing and bang lock issue.
> 
> I don't know if the problem is in my code or something in the existing
> application.
> 
> Error Message:
> ENTER|SearchEventProcessor.visit(ContentNodeDeleteEvent)
> |INFO|INDEXING INFO: Start Indexing new content.
> |INFO|INDEXING INFO: Index Folder Did Not Exist. Start Creation Of New Index
> |INFO|INDEXING INFO: Beginnging Incremental update comparisions
> |INFO|INDEXING INFO: Beginnging Incremental update comparisions
> |INFO|INDEXING INFO: Beginnging Incremental update comparisions
> |INFO|INDEXING INFO: Beginnging Incremental update comparisions
> |INFO|INDEXING INFO: Beginnging Incremental update comparisions
> |INFO|INDEXING INFO: Beginnging Incremental update comparisions
> |INFO|INDEXING INFO: Beginnging Incremental update comparisions
> |INFO|INDEXING INFO: Beginnging Incremental update comparisions
> |INFO|INDEXING INFO: Beginnging Incremental update comparisions
> |INFO|INDEXING INFO: Beginnging Incremental update comparisions
> |INFO|INDEXING INFO: Beginnging Incremental update comparisions
> |INFO|INDEXING INFO: Beginnging Incremental update comparisions
> |INFO|INDEXING INFO: Beginnging Incremental update comparisions
> |INFO|INDEXING INFO: Beginnging Incremental update comparisions
> |INFO|INDEXING INFO: Beginnging Incremental update comparisions
> |INFO|INDEXING INFO: Beginnging Incremental update comparisions
> |INFO|INDEXING INFO: Beginnging Incremental update comparisions
> |INFO|INDEXING ERROR: Unable to index new content Lock obtain timed out:
> Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432
> 10f7fe8-write.lock
> |ENTER|UpdateCacheEventProcessor.visit(ContentNodeDeleteEvent)
> 
> Here is my code. You will recognize it pretty much as the IndexHTML class
> from the Lucene demo written by Doug Cutting. I have put a ton of comments
> in a attempt to understand what is going on.
> 
> Any help would be appreciated.
> 
> Luke
> 
> package com.fbhm.bolt.search;
> 
> /*
>  * Created on Nov 11, 2004
>  *
>  * This class will create a single index file for the Content
>  * Management System (CMS). It contains logic to ensure
>  * indexing is done "intelligently". Based on IndexHTML.java
>  * from the demo folder that ships with Lucene
>  */
> 
> import java.io.File;
> import java.io.IOException;
> import java.util.Arrays;
> import java.util.Date;
> 
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.index.TermEnum;
> import org.pdfbox.searchengine.lucene.LucenePDFDocument;
> import org.apache.lucene.demo.HTMLDocument;
> 
> import com.alaia.common.debug.Trace;
> import com.alaia.common.util.AppProperties;
> 
> /**
>  * @author lshannon Description: <br>
>  *         This class is used to index a content folder. It contains logic
> to
>  *         ensure only new or documents that have been modified since the
> last
>  *         search are indexed. <br>
>  *         Based on code writen by Doug Cutting in the IndexHTML class found
> in
>  *         the Lucene demo
>  */
> public class Indexer {
>  //true during deletion pass, this is when the index already exists
>  private static boolean deleting = false;
> 
>  //object to read existing indexes
>  private static IndexReader reader;
> 
>  //object to write to the index folder
>  private static IndexWriter writer;
> 
>  //this will be used to write the index file
>  private static TermEnum uidIter;
> 
>  /*
>   * This static method does all the work, the end result is an up-to-date
> index folder
>  */
>  public static void Index() {
>   //we will assume to start the index has been created
>   boolean create = true;
>   //set the name of the index file
>   String indexFileLocation =
> AppProperties.getPropertyAsString("bolt.search.siteIndex.index.root");
>   //set the name of the content folder
>   String contentFolderLocation =
> AppProperties.getPropertyAsString("site.root");
>   //manage whether the index needs to be created or not
>   File index = new File(indexFileLocation);
>   File root = new File(contentFolderLocation);
>   //the index file indicated exists, we need an incremental update of the
>   // index
>   if (index.exists()) {
>    Trace.TRACE("INDEXING INFO: An index folder exists at: " +
> indexFileLocation);
>    deleting = true;
>    create = false;
>    try {
>     //this version of index docs is able to execute the incremental
>     // update
>     indexDocs(root, indexFileLocation, create);
>    } catch (Exception e) {
>     //we were unable to do the incremental update
>     Trace.TRACE("INDEXING ERROR: Unable to execute incremental update "
>         + e.getMessage());
>    }
>    //after exiting this loop the index should be current with content
>    Trace.TRACE("INDEXING INFO: Incremental update completed.");
>   }
>   try {
>    //create the writer
>    writer = new IndexWriter(index, new StandardAnalyzer(), create);
>    //configure the writer
>    writer.mergeFactor = 10000;
>    writer.maxFieldLength = 100000;
>    try {
>     //get the start date
>     Date start = new Date();
>     //call the indexDocs method, this time we will add new
>     // documents
>     Trace.TRACE("INDEXING INFO: Start Indexing new content.");
>     indexDocs(root, indexFileLocation, create);
>     Trace.TRACE("INDEXING INFO: Indexing new content complete.");
>     //optimize the index
>     writer.optimize();
>     //close the writer
>     writer.close();
>     //get the end date
>     Date end = new Date();
>     long totalTime = end.getTime() - start.getTime();
>     Trace.TRACE("INDEXING INFO: All Indexing Operations Completed in "
>         + totalTime + " milliseconds");
>    } catch (Exception e1) {
>     //unable to add new documents
>     Trace.TRACE("INDEXING ERROR: Unable to index new content "
>         + e1.getMessage());
>    }
>   } catch (IOException e) {
>    Trace.TRACE("INDEXING ERROR: Unable to create IndexWriter "
>      + e.getMessage());
>   }
>  }
> 
>  /*
>   * Walk directory hierarchy in uid order, while keeping uid iterator from
> /*
>   * existing index in sync. Mismatches indicate one of: (a) old documents to
> /*
>   * be deleted; (b) unchanged documents, to be left alone; or (c) new /*
>   * documents, to be indexed.
>   */
> 
>  private static void indexDocs(File file, String index, boolean create)
>    throws Exception {
>   //the index already exists we do an incremental update
>   if (!create) {
>    Trace.TRACE("INDEXING INFO: Incremental Update Request Confirmed");
>    //open existing index
>    reader = IndexReader.open(index);
>    //this gets an enummeration of uid terms
>    uidIter = reader.terms(new Term("uid", ""));
>    //jump to the index method that does the work
>    //this will use the Iteration above and does
>    //all the "smart" indexing
>    indexDocs(file);
>    //this will be true everytime the index already existed
>    //we are not going to delete documents that are old
>    if (deleting) {
>     Trace.TRACE("INDEXING INFO: Deleting Old Content Phase Started. All
> Deleted Docs will be listed.");
>     while (uidIter.term() != null
>       && uidIter.term().field() == "uid") {
>      //basically we are deleting all the document we have
>      // indexed before
>      Trace.TRACE("INDEXING INFO: Deleting document "
>        + HTMLDocument.uid2url(uidIter.term().text()));
>      //delete the term from the reader
>      reader.delete(uidIter.term());
>      //go to the nextfield
>      uidIter.next();
>     }
>     Trace.TRACE("INDEXING INFO: Deleting Old Content Phase Completed");
>     //turn off the deleting flag
>     deleting = false;
>    }//close the deleting branch
>    //close the enummeration
>    uidIter.close(); // close uid iterator
>    //close the reader
>    reader.close(); // close existing index
> 
>   }
>   //we go here is the index already existed
>   else {
>    Trace.TRACE("INDEXING INFO: Index Folder Did Not Exist. Start Creation Of
> New Index");
>    // don't have exisiting
>    indexDocs(file);
>   }
>  }
> 
>  private static void indexDocs(File file) throws Exception {
>   //check if we are at the top of a directory
>   if (file.isDirectory()) {
>    //get a list of the files
>    String[] files = file.list();
>    //sort them
>    Arrays.sort(files);
>    //index each file in the directory recursively
>    //we keep repeating this logic until we hit a
>    //file
>    for (int i = 0; i < files.length; i++)
>     //pass in the parent directory and the current file
>     //into the file constructor and index
>     indexDocs(new File(file, files[i]));
> 
>   }
>   //we have an actual file, so we need to consider the
>   //file extensions so the correct Document is created
>   else if (file.getPath().endsWith(".html")
>     || file.getPath().endsWith(".htm")
>     || file.getPath().endsWith(".txt")
>     || file.getPath().endsWith(".doc")
>     || file.getPath().endsWith(".xml")
>     || file.getPath().endsWith(".pdf")) {
> 
>    //if this is reached it means we were in the midst
>    //of an incremental update
>    if (uidIter != null) {
>     //get the uid for the document we are on
>     String uid = HTMLDocument.uid(file);
>     //now compare this document to the one we have in the
>     //enummeration of terms.
>     //if the term in the enummeration is less than the
>     //term we are on it must be deleted (if we are indeed
>     //doing an incrementatal update)
>     Trace.TRACE("INDEXING INFO: Beginnging Incremental update
> comparisions");
>     while (uidIter.term() != null
>       && uidIter.term().field() == "uid"
>       && uidIter.term().text().compareTo(uid) < 0) {
>      //delete stale docs
>      if (deleting) {
>       reader.delete(uidIter.term());
>      }
>      uidIter.next();
>     }
>     //if the terms are equal there is no change with this document
>     //we keep it as is
>     if (uidIter.term() != null && uidIter.term().field() == "uid"
>       && uidIter.term().text().compareTo(uid) == 0) {
>      uidIter.next();
>     }
>     //if we are not deleting and the document was not there
>     //it means we didn't have this document on the last index
>     //and we should add it
>     else if (!deleting) {
>      if (file.getPath().endsWith(".pdf")) {
>       Document doc = LucenePDFDocument.getDocument(file);
>       Trace.TRACE("INDEXING INFO: Adding new document to the existing index:
> "
>           + doc.get("url"));
>       writer.addDocument(doc);
>      } else if (file.getPath().endsWith(".xml")) {
>       Document doc = XMLDocument.Document(file);
>       Trace.TRACE("INDEXING INFO: Adding new document to the existing index:
> "
>           + doc.get("url"));
>       writer.addDocument(doc);
>      } else {
>       Document doc = HTMLDocument.Document(file);
>       Trace.TRACE("INDEXING INFO: Adding new document to the existing index:
> "
>           + doc.get("url"));
>       writer.addDocument(doc);
>      }
>     }
>    }//end the if for an incremental update
>    //we are creating a new index, add all document types
>    else {
>     if (file.getPath().endsWith(".pdf")) {
>      Document doc = LucenePDFDocument.getDocument(file);
>      Trace.TRACE("INDEXING INFO: Adding a new document to the new index: "
>          + doc.get("url"));
>      writer.addDocument(doc);
>     } else if (file.getPath().endsWith(".xml")) {
>      Document doc = XMLDocument.Document(file);
>      Trace.TRACE("INDEXING INFO: Adding a new document to the new index: "
>          + doc.get("url"));
>      writer.addDocument(doc);
>     } else {
>      Document doc = HTMLDocument.Document(file);
>      Trace.TRACE("INDEXING INFO: Adding a new document to the new index: "
>          + doc.get("url"));
>      writer.addDocument(doc);
>     }//close the else
>    }//close the else for a new index
>   }//close the else if to handle file types
>  }//close the indexDocs method
> 
> }
> 
> ----- Original Message -----
> From: "Craig McClanahan" <cr...@gmail.com>
> To: "Jakarta Commons Users List" <co...@jakarta.apache.org>
> Sent: Thursday, November 11, 2004 6:13 PM
> Subject: Re: avoiding locking
> 
> > In order to get any useful help, it would be nice to know what you are
> > trying to do, and (most importantly) what commons component is giving
> > you the problem :-).  The traditional approach is to put a prefix on
> > your subject line -- for commons package "foo" it would be:
> >
> >   [foo] avoiding locking
> >
> > It's also generally helpful to see the entire stack trace, not just
> > the exception message itself.
> >
> > Craig
> >
> >
> > On Thu, 11 Nov 2004 17:27:19 -0500, Luke Shannon
> > <ls...@hypermedia.com> wrote:
> > > What can I do to avoid locking issues?
> > >
> > > Unable to execute incremental update Lock obtain timed out:
> Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432
> 10f7fe8-write.lock
> > >
> > > Thanks,
> > >
> > > Luke
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: commons-user-help@jakarta.apache.org
> >
> >
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org