You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rupinder Singh Mazara <rs...@ebi.ac.uk> on 2004/10/05 20:31:40 UTC

leakage in RAMDirectory ?

hi all
 following is some code that i use to index the contents of a table  ( there
are 18746 records in the table. )
 using a database result set , i loop over all the records ,
 creating a document object and indexing into ramDirectory and then onto the
fileSystem

 when I open a IndexReader and output numDoc i get 18740,

 How ever on running the same code, but using a FSDirectory object on
opening a IndexReader I get 18476

 has anyone else come across this behaviour ? jdk being used is 1.4.1


public class JournalIndexer extends  JournalConstants {
    IndexWriter ramWriter ;
    Directory ramDirectory;
    String dir;
    public JournalIndexer(String dir) throws  Exception{
        this.dir = dir;
        ramDirectory = new RAMDirectory();
        ramWriter = new IndexWriter( ramDirectory, new SimpleAnalyzer()
,true );
    }

    public static void main(String args[]) throws Exception {
        Statement stmt   = connection.createStatement();
        JournalIndexer indexer = new JournalIndexer("journals");
         int main_counter = 0;
        // SELECT ID, JOURNALTITLE, NLM_ID, ISSN, MEDLINE_ABBREVIATION,
ISO_ABBREVIATION, ESSN "+
        ResultSet rs = stmt.executeQuery(sqlFetchJournals);
        while(rs.next() ){
            Journal journal = new Journal();
         	///set values
            main_counter++;
            indexer.add( journal );
        }
        indexer.close();
    }

    int count = 0;

    public void add(Journal journal) throws Exception {
        Document  j_doc = new Document();
       //Field(String name      , String    string, boolean store, boolean
index, boolean token)
        Field id     = new Field(ID,""+journal.getId(), true, true, false );
        j_doc.add( id );
        ramWriter.addDocument( j_doc );
         count++;

    }

    public void close() throws  Exception {
        IndexWriter fileWriter = new IndexWriter(
FSDirectory.getDirectory(dir,true), new SimpleAnalyzer(),true);
        Directory dirs[] = { ramDirectory };
        fileWriter.addIndexes( dirs );
        fileWriter.optimize();
        fileWriter.close();
    }

   class JournalAnalyzer extends Analyzer {
     public TokenStream tokenStream(String field,Reader reader)  {
        TokenStream result = new WhitespaceTokenizer(reader);
        result = new LowerCaseFilter(result);
        return  result;
     }
   }

}


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: leakage in RAMDirectory ?

Posted by Rupinder Singh Mazara <rs...@ebi.ac.uk>.
Hi 

 the major issue is that when using FSDirectory and indexing to a directory there are no missing enteries
  where as when indexed using RAMDirectory i get missing enteries 

   currently i am investigating which are the missing enteries, since the application is configures to shutdown in 
  the event of Exception either all get indexed or none

  Rupinder

>-----Original Message-----
>From: Daniel Naber [mailto:daniel.naber@t-online.de]
>Sent: 06 October 2004 20:22
>To: Lucene Users List
>Subject: Re: leakage in RAMDirectory ?
>
>
>On Tuesday 05 October 2004 20:31, Rupinder Singh Mazara wrote:
>
>>  ( there
>> are 18746 records in the table. )
>>  using a database result set , i loop over all the records ,
>>  creating a document object and indexing into ramDirectory and then onto
>> the fileSystem
>>
>>  when I open a IndexReader and output numDoc i get 18740,
>
>It seems even in this case some documents are lost. Do you maybe ignore 
>exceptions? Could you build a self-contained test case that shows the 
>problem? The interesting question is of course *which* documents are lost 
>and if the behaviour is reproducible. The test case will either help you 
>to fix the bug in your code, or it will help us fix the bug in Lucene, if 
>there is any.
>
>Regards
> Daniel
>
>-- 
>http://www.danielnaber.de
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: leakage in RAMDirectory ?

Posted by Daniel Naber <da...@t-online.de>.
On Tuesday 05 October 2004 20:31, Rupinder Singh Mazara wrote:

>  ( there
> are 18746 records in the table. )
>  using a database result set , i loop over all the records ,
>  creating a document object and indexing into ramDirectory and then onto
> the fileSystem
>
>  when I open a IndexReader and output numDoc i get 18740,

It seems even in this case some documents are lost. Do you maybe ignore 
exceptions? Could you build a self-contained test case that shows the 
problem? The interesting question is of course *which* documents are lost 
and if the behaviour is reproducible. The test case will either help you 
to fix the bug in your code, or it will help us fix the bug in Lucene, if 
there is any.

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: leakage in RAMDirectory ?

Posted by Rupinder Singh Mazara <rs...@ebi.ac.uk>.
forgot to mention lucene 1.4.2 is the version I am currently using

>-----Original Message-----
>From: Rupinder Singh Mazara [mailto:rsmazara@ebi.ac.uk]
>Sent: 05 October 2004 19:32
>To: Lucene Users List
>Subject: leakage in RAMDirectory ?
>
>
>hi all
> following is some code that i use to index the contents of a 
>table  ( there
>are 18746 records in the table. )
> using a database result set , i loop over all the records ,
> creating a document object and indexing into ramDirectory and 
>then onto the
>fileSystem
>
> when I open a IndexReader and output numDoc i get 18740,
>
> How ever on running the same code, but using a FSDirectory object on
>opening a IndexReader I get 18476
>
> has anyone else come across this behaviour ? jdk being used is 1.4.1
>
>
>public class JournalIndexer extends  JournalConstants {
>    IndexWriter ramWriter ;
>    Directory ramDirectory;
>    String dir;
>    public JournalIndexer(String dir) throws  Exception{
>        this.dir = dir;
>        ramDirectory = new RAMDirectory();
>        ramWriter = new IndexWriter( ramDirectory, new SimpleAnalyzer()
>,true );
>    }
>
>    public static void main(String args[]) throws Exception {
>        Statement stmt   = connection.createStatement();
>        JournalIndexer indexer = new JournalIndexer("journals");
>         int main_counter = 0;
>        // SELECT ID, JOURNALTITLE, NLM_ID, ISSN, MEDLINE_ABBREVIATION,
>ISO_ABBREVIATION, ESSN "+
>        ResultSet rs = stmt.executeQuery(sqlFetchJournals);
>        while(rs.next() ){
>            Journal journal = new Journal();
>         	///set values
>            main_counter++;
>            indexer.add( journal );
>        }
>        indexer.close();
>    }
>
>    int count = 0;
>
>    public void add(Journal journal) throws Exception {
>        Document  j_doc = new Document();
>       //Field(String name      , String    string, boolean store, boolean
>index, boolean token)
>        Field id     = new Field(ID,""+journal.getId(), true, 
>true, false );
>        j_doc.add( id );
>        ramWriter.addDocument( j_doc );
>         count++;
>
>    }
>
>    public void close() throws  Exception {
>        IndexWriter fileWriter = new IndexWriter(
>FSDirectory.getDirectory(dir,true), new SimpleAnalyzer(),true);
>        Directory dirs[] = { ramDirectory };
>        fileWriter.addIndexes( dirs );
>        fileWriter.optimize();
>        fileWriter.close();
>    }
>
>   class JournalAnalyzer extends Analyzer {
>     public TokenStream tokenStream(String field,Reader reader)  {
>        TokenStream result = new WhitespaceTokenizer(reader);
>        result = new LowerCaseFilter(result);
>        return  result;
>     }
>   }
>
>}
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org