You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Pierce, Tania" <tp...@cov.com> on 2004/01/15 17:35:37 UTC

lucene not indexing under apache 2.0/windows?

Let me preface this by saying I am a total beginner to
apache/java/tomcat/cocoon etc.  I'm thankfully fluent in xml/xslt or
this would be a nightmare.

Anyway, I have been given the task to figure out why one of our sites
continues to chew up memory and never releases it to the point where I
have to stop and start the tomcat service on a daily basis.  We're using
tomcat 4.1.24, j2se 1.4.1.04 on a win2k server (isapi redirect through
iis).  Our site is made of up of a repository of xml docs (2,000 or so)
which get chewed up and spit out as html thanks to transforms set forth
in our cocoon pipeline.   We have lucene in place to create large xml
files (in memory) so that certain web pages don't have to loop through
hundreds of smaller xml files; instead, the xslt loops through the nodes
contained in the in-memory xml doc that's created for us by lucene.

So my manager had me set up a mirror site on a different machine running
all of the above EXCEPT no IIS, our web server is Apache 2.0 (to rule
out IIS, which I don't think is the issue anyway).   Everything on this
mirror site works except lucene.  I can rebuild the lucene index by
running a .bat file our vendor wrote for us and it runs w/o error.
However, when I take a look at the resulting aggregate xml docs
(cached), they're empty.  To top it off, the cocoon pipeline seems to be
trying to apply our xsl templates to the cached xml docs... There are no
errors in any of the log files.

Any ideas?  What do I need to do (as clearly as you can please, I have
just enough knowledge on all this java/apache/tomcat/cocoon to be
dangerous) to get lucene to write out the index to memory?  It's running
through the docs it should be indexing (I can watch the output to the
cmd screen).   This all works fine on our live site, I literally copied
over the webapps directory and a few tomcat/cocoon files (web.xml,
cocoon.xconf, etc).  I can say that w/ the exception of IIS/isapi
redirect, the set up and files are all identical... 

Hope that makes sense.

Huge thanks,
T.


Help on IOException and FileNotFoundException (synchronization issue)

Posted by Ardor Wei <ar...@yahoo.com>.
Hi, experts, 

I am new to Lucene. I am trying to fix bugs in
existing code. I read Lucene final 1.3 Doc (some of
API) and searched the related thread on the mailing
list archive. But I still couldn't solve problem even
though I know 
the problem might be related to synchronization
issues. Typically I encountered 3 types of problem:
couldn't delete, file not found, lock obtain timeout.
Here are some exception stacks: (Sorry for the long
post.)
java.io.IOException: couldn't delete _17.fdt
        at
org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166)
        at
org.apache.lucene.store.FSDirectory.<init>(FSDirectory.java:151)
        at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:132)
        at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:113)
        at
org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:151)
        at
com.panva.lucene.ProfileIndexer.<init>(ProfileIndexer.java:47)
        at
com.panva.lucene.ProfileDBIndexer.createIndex(ProfileDBIndexer.java:67)
        at
com.panva.lucene.MainIndexScheduler.createMainSearchIndex(MainIndexScheduler.java:99)
        at
com.panva.lucene.MainIndexScheduler.run(MainIndexScheduler.java:60)

java.io.FileNotFoundException:
C:\lucenesource\index\_17.f1 (The system cannot find
the file specified)
        at java.io.RandomAccessFile.open(Native
Method)
        at
java.io.RandomAccessFile.<init>(RandomAccessFile.java:200)
        at
org.apache.lucene.store.FSInputStream$Descriptor.<init>(FSDirectory.java:389)
        at
org.apache.lucene.store.FSInputStream.<init>(FSDirectory.java:418)
        at
org.apache.lucene.store.FSDirectory.openFile(FSDirectory.java:291)
        at
org.apache.lucene.index.SegmentReader.openNorms(SegmentReader.java:388)
        at
org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:151)
        at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:423)
        at
org.apache.lucene.index.IndexWriter.maybeMergeSegments(IndexWriter.java:401)
        at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:260)
        at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:244)
        at
com.panva.lucene.ProfileIndexer.addProfile(ProfileIndexer.java:89)
        at
com.panva.lucene.ProfileDBIndexer.createIndex(ProfileDBIndexer.java:72)
        at
com.panva.lucene.MainIndexScheduler.createMainSearchIndex(MainIndexScheduler.java:99)
        at
com.panva.lucene.MainIndexScheduler.run(MainIndexScheduler.java:60)

java.io.IOException: Lock obtain timed out
        at
org.apache.lucene.store.Lock.obtain(Lock.java:97)
        at
org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:173)
        at
org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:151)
        at
com.panva.lucene.ProfileIndexer.<init>(ProfileIndexer.java:47)
        at
com.panva.lucene.ProfileDBIndexer.createIndex(ProfileDBIndexer.java:67)
        at
com.panva.lucene.IndexScheduler.createRealTimeSearchIndex(IndexScheduler.java:103)
        at
com.panva.lucene.IndexScheduler.run(IndexScheduler.java:63)


In my application, mutilple threads are writing and
searching. Here is the code snippet (not complete, but
should be enough):

// ProfileDBIndexer.java
public class ProfileDBIndexer
{

  public static void createIndex(String path, String
sqlStmt) throws Throwable
  {  // blah, blah
    try
    {
      // DB code
      rs = stmt.executeQuery(sqlStmt);

      indexer = new ProfileIndexer(path, true);

      while (rs.next())
      {

        Profile profile = getProfileFromResultSet(rs);
        indexer.addProfile(profile);  // if I do
synchronize(indexer) here and use writer.close() in
tbe following addProfile() method,
NullPointerException is thrown. Looks like    
writeLock.release() in close() of IndexWriter throws
this. 
        noOfRecordsProcessed++ ;
      }
    }
    catch (Exception e)
    {
      e.printStackTrace();
    }
    finally
    {
     // close DB connection
    }
  }
}

// ProfileIndexer.java
public class ProfileIndexer {
  IndexWriter writer;

  public ProfileIndexer(String path, boolean create)
throws IOException {
    Analyzer analyzer = new AlphanumStopAnalyzer();
    writer = new IndexWriter(path, analyzer, create);
  }

  public void addProfile(Profile profile) throws
IOException {
   Document document = new Document();

    document.add(Field.Keyword("Username",
profile.getUsername()));
    ...... //many document.add() here

    writer.addDocument(document);
    // writer.optimize();
    // writer.close();
 }
}


In the thread class, the following method is called
frequently:
ProfileDBIndexer.createIndex( indexPath, indexQuery )
;

In my application, index searcher is driven by client
request, not multi-threaded, it doesn't delete index
file, and no synchronization is used.

I tried to use synchronization for some methods, but
it didn't work out. I know I didn't realize the real
problem. I am lost.

Could you help me or give any suggestion?  Thanks a
lot in advance!

Ardor Wei

__________________________________
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: lucene not indexing under apache 2.0/windows?

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
You're missing something in your explanation.  Lucene does not create 
XML files.


On Jan 15, 2004, at 11:35 AM, Pierce, Tania wrote:

> Let me preface this by saying I am a total beginner to
> apache/java/tomcat/cocoon etc.  I'm thankfully fluent in xml/xslt or
> this would be a nightmare.
>
> Anyway, I have been given the task to figure out why one of our sites
> continues to chew up memory and never releases it to the point where I
> have to stop and start the tomcat service on a daily basis.  We're 
> using
> tomcat 4.1.24, j2se 1.4.1.04 on a win2k server (isapi redirect through
> iis).  Our site is made of up of a repository of xml docs (2,000 or so)
> which get chewed up and spit out as html thanks to transforms set forth
> in our cocoon pipeline.   We have lucene in place to create large xml
> files (in memory) so that certain web pages don't have to loop through
> hundreds of smaller xml files; instead, the xslt loops through the 
> nodes
> contained in the in-memory xml doc that's created for us by lucene.
>
> So my manager had me set up a mirror site on a different machine 
> running
> all of the above EXCEPT no IIS, our web server is Apache 2.0 (to rule
> out IIS, which I don't think is the issue anyway).   Everything on this
> mirror site works except lucene.  I can rebuild the lucene index by
> running a .bat file our vendor wrote for us and it runs w/o error.
> However, when I take a look at the resulting aggregate xml docs
> (cached), they're empty.  To top it off, the cocoon pipeline seems to 
> be
> trying to apply our xsl templates to the cached xml docs... There are 
> no
> errors in any of the log files.
>
> Any ideas?  What do I need to do (as clearly as you can please, I have
> just enough knowledge on all this java/apache/tomcat/cocoon to be
> dangerous) to get lucene to write out the index to memory?  It's 
> running
> through the docs it should be indexing (I can watch the output to the
> cmd screen).   This all works fine on our live site, I literally copied
> over the webapps directory and a few tomcat/cocoon files (web.xml,
> cocoon.xconf, etc).  I can say that w/ the exception of IIS/isapi
> redirect, the set up and files are all identical...
>
> Hope that makes sense.
>
> Huge thanks,
> T.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org