You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Carl Austin <Ca...@detica.com> on 2010/05/11 15:27:31 UTC

FieldCache and 2.9

Hi,

I have been using the FieldCache in lucene version 2.9 compared to that
in 2.4. The load time is massively decreased, however I am not seeing
any benefit in getting a field cache after re-open of an index reader
when I have only added a few extra documents.
A small test class is included below (based off one from Lucid
Imagination), that creates 5Mil docs, gets a field cache, creates
another few docs and gets the field cache again. I though the second get
would be very very fast, as only 1 segment should have changed, however
it takes more time for the reopen and cache get than it did the
original.

Am I doing something wrong here or have I misunderstood the new segment
changes?

Thanks

Carl


import java.io.File;

import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.search.FieldCache;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class ContrivedFCTest {

	public static void main(String[] args) throws Exception {
		Directory dir = FSDirectory.open(new File(args[0]));
		IndexWriter writer = new IndexWriter(dir, new
SimpleAnalyzer(), true,
				IndexWriter.MaxFieldLength.LIMITED);
		for (int i = 0; i < 5000000; i++) {
			if (i % 100000 == 0) {
				System.out.println(i);
			}
			Document doc = new Document();
			doc.add(new Field("field", "String" + i,
Field.Store.NO,
					Field.Index.NOT_ANALYZED));
			writer.addDocument(doc);
		}
		writer.close();

		IndexReader reader = IndexReader.open(dir, true);
		long start = System.currentTimeMillis();
		FieldCache.DEFAULT.getStrings(reader, "field");
		long end = System.currentTimeMillis();
		System.out.println("load time for initial field cache:"
+ (end - start)
				/ 1000.0f + "s");

		writer = new IndexWriter(dir, new SimpleAnalyzer(),
false,
				IndexWriter.MaxFieldLength.LIMITED);
		for (int i = 5000001; i < 5000005; i++) {
			if (i % 100000 == 0) {
				System.out.println(i);
			}
			Document doc = new Document();
			doc.add(new Field("field", "String" + i,
Field.Store.NO,
					Field.Index.NOT_ANALYZED));
			writer.addDocument(doc);
		}
		writer.close();

		IndexReader reader2 = reader.reopen(true);
		System.out.println("reader size = " +
reader2.numDocs());
		long start2 = System.currentTimeMillis();
		FieldCache.DEFAULT.getStrings(reader2, "field");
		long end2 = System.currentTimeMillis();
		System.out.println("load time for re-opened field
cache:"
				+ (end2 - start2) / 1000.0f + "s");
	}
}

This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.
Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies within the Detica Limited group of companies.

Detica Limited is registered in England under No: 1337451.

Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.

RE: FieldCache and 2.9

Posted by Carl Austin <Ca...@detica.com>.

Ah, ok, thanks for that. 
I had hoped that the field cache would do this for me, by going through the subreaders itself. Is this likely to be done in a future release?
I may have to implement some wrapper that does this anyway, and if so, can submit it as contrib if that would be useful?

Thanks

Carl

-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik Seeley
Sent: 11 May 2010 14:41
To: java-user@lucene.apache.org
Subject: Re: FieldCache and 2.9

You are requesting the FieldCache entry from the top-level reader and hence a whole new FieldCache entry must be created.
Lucene 2.9 sorting requests FieldCache entries at the segment level and hence reuses entries for those segments that haven't changed.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague



On Tue, May 11, 2010 at 9:27 AM, Carl Austin <Ca...@detica.com> wrote:
> Hi,
>
> I have been using the FieldCache in lucene version 2.9 compared to 
> that in 2.4. The load time is massively decreased, however I am not 
> seeing any benefit in getting a field cache after re-open of an index 
> reader when I have only added a few extra documents.
> A small test class is included below (based off one from Lucid 
> Imagination), that creates 5Mil docs, gets a field cache, creates 
> another few docs and gets the field cache again. I though the second 
> get would be very very fast, as only 1 segment should have changed, 
> however it takes more time for the reopen and cache get than it did 
> the original.
>
> Am I doing something wrong here or have I misunderstood the new 
> segment changes?
>
> Thanks
>
> Carl
>
>
> import java.io.File;
>
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field; import 
> org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.search.FieldCache;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.FSDirectory;
>
> public class ContrivedFCTest {
>
>        public static void main(String[] args) throws Exception {
>                Directory dir = FSDirectory.open(new File(args[0]));
>                IndexWriter writer = new IndexWriter(dir, new 
> SimpleAnalyzer(), true,
>                                IndexWriter.MaxFieldLength.LIMITED);
>                for (int i = 0; i < 5000000; i++) {
>                        if (i % 100000 == 0) {
>                                System.out.println(i);
>                        }
>                        Document doc = new Document();
>                        doc.add(new Field("field", "String" + i, 
> Field.Store.NO,
>                                        Field.Index.NOT_ANALYZED));
>                        writer.addDocument(doc);
>                }
>                writer.close();
>
>                IndexReader reader = IndexReader.open(dir, true);
>                long start = System.currentTimeMillis();
>                FieldCache.DEFAULT.getStrings(reader, "field");
>                long end = System.currentTimeMillis();
>                System.out.println("load time for initial field cache:"
> + (end - start)
>                                / 1000.0f + "s");
>
>                writer = new IndexWriter(dir, new SimpleAnalyzer(), 
> false,
>                                IndexWriter.MaxFieldLength.LIMITED);
>                for (int i = 5000001; i < 5000005; i++) {
>                        if (i % 100000 == 0) {
>                                System.out.println(i);
>                        }
>                        Document doc = new Document();
>                        doc.add(new Field("field", "String" + i, 
> Field.Store.NO,
>                                        Field.Index.NOT_ANALYZED));
>                        writer.addDocument(doc);
>                }
>                writer.close();
>
>                IndexReader reader2 = reader.reopen(true);
>                System.out.println("reader size = " + 
> reader2.numDocs());
>                long start2 = System.currentTimeMillis();
>                FieldCache.DEFAULT.getStrings(reader2, "field");
>                long end2 = System.currentTimeMillis();
>                System.out.println("load time for re-opened field 
> cache:"
>                                + (end2 - start2) / 1000.0f + "s");
>        }
> }
>
> This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies within the Detica Limited group of companies.
>
> Detica Limited is registered in England under No: 1337451.
>
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.
Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies within the Detica Limited group of companies.

Detica Limited is registered in England under No: 1337451.

Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: FieldCache and 2.9

Posted by Yonik Seeley <yo...@lucidimagination.com>.

You are requesting the FieldCache entry from the top-level reader and
hence a whole new FieldCache entry must be created.
Lucene 2.9 sorting requests FieldCache entries at the segment level
and hence reuses entries for those segments that haven't changed.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague



On Tue, May 11, 2010 at 9:27 AM, Carl Austin <Ca...@detica.com> wrote:
> Hi,
>
> I have been using the FieldCache in lucene version 2.9 compared to that
> in 2.4. The load time is massively decreased, however I am not seeing
> any benefit in getting a field cache after re-open of an index reader
> when I have only added a few extra documents.
> A small test class is included below (based off one from Lucid
> Imagination), that creates 5Mil docs, gets a field cache, creates
> another few docs and gets the field cache again. I though the second get
> would be very very fast, as only 1 segment should have changed, however
> it takes more time for the reopen and cache get than it did the
> original.
>
> Am I doing something wrong here or have I misunderstood the new segment
> changes?
>
> Thanks
>
> Carl
>
>
> import java.io.File;
>
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.search.FieldCache;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.FSDirectory;
>
> public class ContrivedFCTest {
>
>        public static void main(String[] args) throws Exception {
>                Directory dir = FSDirectory.open(new File(args[0]));
>                IndexWriter writer = new IndexWriter(dir, new
> SimpleAnalyzer(), true,
>                                IndexWriter.MaxFieldLength.LIMITED);
>                for (int i = 0; i < 5000000; i++) {
>                        if (i % 100000 == 0) {
>                                System.out.println(i);
>                        }
>                        Document doc = new Document();
>                        doc.add(new Field("field", "String" + i,
> Field.Store.NO,
>                                        Field.Index.NOT_ANALYZED));
>                        writer.addDocument(doc);
>                }
>                writer.close();
>
>                IndexReader reader = IndexReader.open(dir, true);
>                long start = System.currentTimeMillis();
>                FieldCache.DEFAULT.getStrings(reader, "field");
>                long end = System.currentTimeMillis();
>                System.out.println("load time for initial field cache:"
> + (end - start)
>                                / 1000.0f + "s");
>
>                writer = new IndexWriter(dir, new SimpleAnalyzer(),
> false,
>                                IndexWriter.MaxFieldLength.LIMITED);
>                for (int i = 5000001; i < 5000005; i++) {
>                        if (i % 100000 == 0) {
>                                System.out.println(i);
>                        }
>                        Document doc = new Document();
>                        doc.add(new Field("field", "String" + i,
> Field.Store.NO,
>                                        Field.Index.NOT_ANALYZED));
>                        writer.addDocument(doc);
>                }
>                writer.close();
>
>                IndexReader reader2 = reader.reopen(true);
>                System.out.println("reader size = " +
> reader2.numDocs());
>                long start2 = System.currentTimeMillis();
>                FieldCache.DEFAULT.getStrings(reader2, "field");
>                long end2 = System.currentTimeMillis();
>                System.out.println("load time for re-opened field
> cache:"
>                                + (end2 - start2) / 1000.0f + "s");
>        }
> }
>
> This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory.  The contents of this email may relate to dealings with other companies within the Detica Limited group of companies.
>
> Detica Limited is registered in England under No: 1337451.
>
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org