You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rupinder Singh Mazara <rs...@ebi.ac.uk> on 2004/10/05 20:31:40 UTC
leakage in RAMDirectory ?
hi all
following is some code that i use to index the contents of a table ( there
are 18746 records in the table. )
using a database result set , i loop over all the records ,
creating a document object and indexing into ramDirectory and then onto the
fileSystem
when I open a IndexReader and output numDoc i get 18740,
How ever on running the same code, but using a FSDirectory object on
opening a IndexReader I get 18476
has anyone else come across this behaviour ? jdk being used is 1.4.1
public class JournalIndexer extends JournalConstants {
IndexWriter ramWriter ;
Directory ramDirectory;
String dir;
public JournalIndexer(String dir) throws Exception{
this.dir = dir;
ramDirectory = new RAMDirectory();
ramWriter = new IndexWriter( ramDirectory, new SimpleAnalyzer()
,true );
}
public static void main(String args[]) throws Exception {
Statement stmt = connection.createStatement();
JournalIndexer indexer = new JournalIndexer("journals");
int main_counter = 0;
// SELECT ID, JOURNALTITLE, NLM_ID, ISSN, MEDLINE_ABBREVIATION,
ISO_ABBREVIATION, ESSN "+
ResultSet rs = stmt.executeQuery(sqlFetchJournals);
while(rs.next() ){
Journal journal = new Journal();
///set values
main_counter++;
indexer.add( journal );
}
indexer.close();
}
int count = 0;
public void add(Journal journal) throws Exception {
Document j_doc = new Document();
//Field(String name , String string, boolean store, boolean
index, boolean token)
Field id = new Field(ID,""+journal.getId(), true, true, false );
j_doc.add( id );
ramWriter.addDocument( j_doc );
count++;
}
public void close() throws Exception {
IndexWriter fileWriter = new IndexWriter(
FSDirectory.getDirectory(dir,true), new SimpleAnalyzer(),true);
Directory dirs[] = { ramDirectory };
fileWriter.addIndexes( dirs );
fileWriter.optimize();
fileWriter.close();
}
class JournalAnalyzer extends Analyzer {
public TokenStream tokenStream(String field,Reader reader) {
TokenStream result = new WhitespaceTokenizer(reader);
result = new LowerCaseFilter(result);
return result;
}
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
RE: leakage in RAMDirectory ?
Posted by Rupinder Singh Mazara <rs...@ebi.ac.uk>.
Hi
the major issue is that when using FSDirectory and indexing to a directory there are no missing enteries
where as when indexed using RAMDirectory i get missing enteries
currently i am investigating which are the missing enteries, since the application is configures to shutdown in
the event of Exception either all get indexed or none
Rupinder
>-----Original Message-----
>From: Daniel Naber [mailto:daniel.naber@t-online.de]
>Sent: 06 October 2004 20:22
>To: Lucene Users List
>Subject: Re: leakage in RAMDirectory ?
>
>
>On Tuesday 05 October 2004 20:31, Rupinder Singh Mazara wrote:
>
>> ( there
>> are 18746 records in the table. )
>> using a database result set , i loop over all the records ,
>> creating a document object and indexing into ramDirectory and then onto
>> the fileSystem
>>
>> when I open a IndexReader and output numDoc i get 18740,
>
>It seems even in this case some documents are lost. Do you maybe ignore
>exceptions? Could you build a self-contained test case that shows the
>problem? The interesting question is of course *which* documents are lost
>and if the behaviour is reproducible. The test case will either help you
>to fix the bug in your code, or it will help us fix the bug in Lucene, if
>there is any.
>
>Regards
> Daniel
>
>--
>http://www.danielnaber.de
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: leakage in RAMDirectory ?
Posted by Daniel Naber <da...@t-online.de>.
On Tuesday 05 October 2004 20:31, Rupinder Singh Mazara wrote:
> ( there
> are 18746 records in the table. )
> using a database result set , i loop over all the records ,
> creating a document object and indexing into ramDirectory and then onto
> the fileSystem
>
> when I open a IndexReader and output numDoc i get 18740,
It seems even in this case some documents are lost. Do you maybe ignore
exceptions? Could you build a self-contained test case that shows the
problem? The interesting question is of course *which* documents are lost
and if the behaviour is reproducible. The test case will either help you
to fix the bug in your code, or it will help us fix the bug in Lucene, if
there is any.
Regards
Daniel
--
http://www.danielnaber.de
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
RE: leakage in RAMDirectory ?
Posted by Rupinder Singh Mazara <rs...@ebi.ac.uk>.
forgot to mention lucene 1.4.2 is the version I am currently using
>-----Original Message-----
>From: Rupinder Singh Mazara [mailto:rsmazara@ebi.ac.uk]
>Sent: 05 October 2004 19:32
>To: Lucene Users List
>Subject: leakage in RAMDirectory ?
>
>
>hi all
> following is some code that i use to index the contents of a
>table ( there
>are 18746 records in the table. )
> using a database result set , i loop over all the records ,
> creating a document object and indexing into ramDirectory and
>then onto the
>fileSystem
>
> when I open a IndexReader and output numDoc i get 18740,
>
> How ever on running the same code, but using a FSDirectory object on
>opening a IndexReader I get 18476
>
> has anyone else come across this behaviour ? jdk being used is 1.4.1
>
>
>public class JournalIndexer extends JournalConstants {
> IndexWriter ramWriter ;
> Directory ramDirectory;
> String dir;
> public JournalIndexer(String dir) throws Exception{
> this.dir = dir;
> ramDirectory = new RAMDirectory();
> ramWriter = new IndexWriter( ramDirectory, new SimpleAnalyzer()
>,true );
> }
>
> public static void main(String args[]) throws Exception {
> Statement stmt = connection.createStatement();
> JournalIndexer indexer = new JournalIndexer("journals");
> int main_counter = 0;
> // SELECT ID, JOURNALTITLE, NLM_ID, ISSN, MEDLINE_ABBREVIATION,
>ISO_ABBREVIATION, ESSN "+
> ResultSet rs = stmt.executeQuery(sqlFetchJournals);
> while(rs.next() ){
> Journal journal = new Journal();
> ///set values
> main_counter++;
> indexer.add( journal );
> }
> indexer.close();
> }
>
> int count = 0;
>
> public void add(Journal journal) throws Exception {
> Document j_doc = new Document();
> //Field(String name , String string, boolean store, boolean
>index, boolean token)
> Field id = new Field(ID,""+journal.getId(), true,
>true, false );
> j_doc.add( id );
> ramWriter.addDocument( j_doc );
> count++;
>
> }
>
> public void close() throws Exception {
> IndexWriter fileWriter = new IndexWriter(
>FSDirectory.getDirectory(dir,true), new SimpleAnalyzer(),true);
> Directory dirs[] = { ramDirectory };
> fileWriter.addIndexes( dirs );
> fileWriter.optimize();
> fileWriter.close();
> }
>
> class JournalAnalyzer extends Analyzer {
> public TokenStream tokenStream(String field,Reader reader) {
> TokenStream result = new WhitespaceTokenizer(reader);
> result = new LowerCaseFilter(result);
> return result;
> }
> }
>
>}
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org