You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Daniel B. Davis" <db...@smart.net> on 2004/02/15 21:55:36 UTC

Intermediate indexing before final

I am a newbie to Lucene, and have been learning by experiment
and from the demos.   A problem has arisen in indexing
a document after creation, and before indexing in the
permanent index. It is being indexed to this small lookaside
index in order to determine whether it is
"sponsored" [i.e. contains any word that causes it to
be included in one of the 'sponsored' document levels.]
(A separate letter deals with the larger issues of
sponsorship.)  If it is sponsored, then a setBoost for
the document will be issued, with a level-dependent
value.

The code in question arises from within IndexHTML
near:
	doc = new HTMLDocument(file);
	writer.addDocument(doc);

In the case at issue, this code has been changed to:
	doc = new HTMLDocument(file);
         int boost = sponsoredValue(doc);
         doc.setBoost(boost);
	writer.addDocument(doc);

The sponsoredValue method never returns.

The exception occurs after a longish delay in
eclipse, about 2-3 seconds.  The document used is:
           http://www.w3.org/TR/xquery
stored as a local file. The same document indexes
correctly when the call to sponsoredValue and setBoost
are removed.

HTMLDocument was modified in minor ways.  HTMLParser
is destined for modification, but is still vanilla.

Note that altering RAMDirectory to FSDirectory makes
no difference and does not change the behavior.

I greatly Appreciate any help, thank you all.

  -------------------------------------------------

the Document doc:
       url: Keyword, string
       file: Unindexed, string
       modified: Keyword, string
       uid: as in HTMLdemo, string
       contents: Text, reader
       title: Text, string
       metadata: Text, string

the code:

   private static RAMDirectory ramDir = null;
   private static IndexWriter ramWriter = null;
   private static IndexReader ramReader = null;
   private static IndexSearcher ramSearcher = null;

   public int sponsoredValue(Document doc) {
       .
       .
       .
       ramDir = new RAMDirectory();
       ramWriter = new IndexWriter(ramDir, new StandardAnalyzer(), true);
+-->  ramWriter.addDocument(doc);
|     ramWriter.close();
|     ramWriter = null;
|     ramReader = IndexReader.open(ramDir);
|     ramSearcher = new IndexSearcher(ramReader);
|     .
|     .
|     .
|     }
|
the Exception:

java.io.IOException: Pipe closed
	at java.io.PipedInputStream.receive(Unknown Source)
	at java.io.PipedInputStream.receive(Unknown Source)
	at java.io.PipedOutputStream.write(Unknown Source)
	at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(Unknown Source)
	at sun.nio.cs.StreamEncoder$CharsetSE.implWrite(Unknown Source)
	at sun.nio.cs.StreamEncoder.write(Unknown Source)
	at sun.nio.cs.StreamEncoder.write(Unknown Source)
	at java.io.OutputStreamWriter.write(Unknown Source)
	at java.io.Writer.write(Unknown Source)
	at org.apache.lucene.demo.html.HTMLParser.addText(HTMLParser.java:141)
	at org.apache.lucene.demo.html.HTMLParser.HTMLDocument(HTMLParser.java:200)
	at org.apache.lucene.demo.html.ParserThread.run(ParserThread.java:69)







---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Intermediate indexing before final

Posted by "Daniel B. Davis" <db...@smart.net>.
At 03:55 PM 2/15/2004 -0500, you wrote:
>Daniel B. Davis

Found the problem and the solution raises questions.  The problem arose 
from the contents field, which was reader-valued.  The reader is a pipe, 
which became closed and thus the fault.

This raises questions about reader-valued fields:
    * Do they have a life-cycle?
    * Can reader-valued fields be used in situations different from where 
they were created?
    * What causes them to close?
    * Can they be used multiple times, like a String?
    * Can they be reset for re-use?