You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Li Li <fa...@gmail.com> on 2012/06/20 14:49:00 UTC

any good idea for loading fields into memory?

hi all
    I need to return certain fields of all matched documents quickly.
I am now using Document.get(field), but the performance is not well
enough. Originally I use HashMap to store these fields. it's much
faster but I have to maintain two storage systems. Now I am
reconstructing this project. I want to store everything in lucene.
    when I use an IndexSearcher to perform searching, I can get
related fields by docID. it must thread safe. And like the IndexReader
it's a snapshot of the index
    Here are some solutions I can come up with:
    1. StringIndex
       I have considered StringIndex but some fields need to tokenize.
maybe I can use two fields, one is tokenized for searching. Another is
indexed but not analyzed, the later one is only used for StringIndex.
If there is any better solution, maybe I have to use this one.
    2. Associating a Map with each IndexReader
       when the IndexReader is opened or reopened, I need to iterate
through each documents of this Reader and put everything into a map.
The problem is it's slower and I don't know whether it's problematic
with NRT.

    is there any other better solution? thanks.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: any good idea for loading fields into memory?

Posted by Li Li <fa...@gmail.com>.

thanks, l will try it
在 2012-6-20 晚上8:59，"Danil ŢORIN" <to...@gmail.com>写道：
>
> I think you are looking for FieldCache.
>
> I'm not sure the current status in 4x, but it worked in 2.9/3.x.
> Basically it's an array, so access is quite straight forward, and the
> best part IndexReader manage those for you, so on reopen only new
> segments are read.
>
> Small catch is that FiledCaches are per segment, so you need to be
> careful if you want to retrieve data using global document ids.
> However if you are building result set in your own Collector, using
> FieldCache is quite straight forward.
>
>
> On Wed, Jun 20, 2012 at 3:49 PM, Li Li <fa...@gmail.com> wrote:
> > hi all
> >    I need to return certain fields of all matched documents quickly.
> > I am now using Document.get(field), but the performance is not well
> > enough. Originally I use HashMap to store these fields. it's much
> > faster but I have to maintain two storage systems. Now I am
> > reconstructing this project. I want to store everything in lucene.
> >    when I use an IndexSearcher to perform searching, I can get
> > related fields by docID. it must thread safe. And like the IndexReader
> > it's a snapshot of the index
> >    Here are some solutions I can come up with:
> >    1. StringIndex
> >       I have considered StringIndex but some fields need to tokenize.
> > maybe I can use two fields, one is tokenized for searching. Another is
> > indexed but not analyzed, the later one is only used for StringIndex.
> > If there is any better solution, maybe I have to use this one.
> >    2. Associating a Map with each IndexReader
> >       when the IndexReader is opened or reopened, I need to iterate
> > through each documents of this Reader and put everything into a map.
> > The problem is it's slower and I don't know whether it's problematic
> > with NRT.
> >
> >    is there any other better solution? thanks.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

RE: any good idea for loading fields into memory?

Posted by Paul Hill <pa...@metajure.com>.

OK, fair enough, you want to keep everything very fast. I'm surprised that large documents are slower for searching.
I'm way impressed all the time by the search times.  Finding good hit fragments on a big document can be slow, but for me
(searching human created documents) is never slow.

> the reason of slowing down is getting fields value, when I use my wrapped indexsearcher which loading
> fields into arrays, it's as far as before.

I recently read a note somewhere that said to not do getdocument during a search, but using the (segment level) field caches is faster.
But I can't say what the tradeoff would be if you wanted most fields at each step in the search.

Good luck,
-Paul

RE: any good idea for loading fields into memory?

Posted by Li Li <fa...@gmail.com>.

30 ms is important.
for one reason, I am reconstructing a project and integrating everything
into lucene. so it should be as fast as before.
the second reason is the matching of lucene is just a small part, there are
many other steps. also, 40ms is the average time, for some query which
matching 10k documents, it needs hundreds of ms.
the reason of slowing down is getting fields value, when I use my wrapped
indexsearcher which loading fields into arrays, it's as far as before.
why I use lucene instead of just a hashmap？ first, I do not want to
implement an in memory invert index myself. second, hashmap should be saved
to disk for persistency. third, modifying hashmap with transaction is
difficult, when nodifying, searching threads should see all things or
nothing. fourth, dealing with boolean query is convenient in lucene.
在 2012-6-23 凌晨1:16，"Paul Hill" <pa...@metajure.com>写道：

>
> 10 ms vs 40 ms. I'd say so what?
> Is your overall time noticeably effected by this 30 ms gain?  Does the end
> user notice this 30 ms gain?
> Where is the time going?  Just getting the hits?  Getting all documents?
>  Building result set as your app uses it?
>
> If it is the hits, have you considered searching on a hash value instead
> of the value of the field?
> If it is getting the documents, are you getting too much but only using a
> little in this particular case?
> If it is building the result set, because of need to re-parse, I would
> look into trying a 2nd multi-valued field with exactly (or closer to) what
> you need in it.
>
> -Paul
>
> > -----Original Message-----
> > From: Li Li [mailto:fancyerii@gmail.com]
> > our old map implementation use about 10 ms, while newer one is 40
> > ms. the reason is we need to return some fields of all hitted documents.
> the fields are not very long strings
> > and the document number is less than 100k
>
>
> > 在 2012-6-22 下午5:13，"Danil ŢORIN" <to...@gmail.com>写道：
> >
> > > If you can afford it, you could add one additional untokenized stored
> > > field that will contain the serialized(one way or another) form of the
> > > document.
> > >
> > > Add FieldCache on top of it, and return it right away.
> > >
> > > But we are getting into the area where you basically have to keep all
> > > your documents in memory.
> > >
> > > In this situation, maybe it simply doesn't make sense to over
> > > complicate things: just keep your index in memory (as it is right now,
> > > no additional fields or field caches), and retrieving document would
> > > be fast enough simply because all data is in RAM.
> > >
> > >
> > > On Fri, Jun 22, 2012 at 3:56 AM, Li Li <fa...@gmail.com> wrote:
> > > > use collector and field cache is a good idea for ranking by certain
> > > > field's value.
> > > > but I just need to return matched documents' fields. and also field
> > > > cache can't store multi-value fields?
> > > > I have to store special chars like '\n' to separate them and split
> > > > string to string array in runtime.
> > > >
> > > > On Fri, Jun 22, 2012 at 5:11 AM, Paul Hill <pa...@metajure.com>
> wrote:
> > > >> I would ask the question that if you want to look at the whole
> > > >> value of
> > > a field during searching, why don't you have a just such a field in
> > > your index?
> > > >> I have an index with several fields that have 2 versions of the
> > > >> field
> > > both analyzed and unanalyzed.  It works great for me in 3.x (not 4.x).
> > > >> Have you read about Collectors?  That is where I find myself
> > > >> working
> > > with field caches, but maybe this is not your need. I also properly
> > > configured the call to search.doc( docId ) with the second argument,
> > > >> so I only automatically load the fields I will be using in my
> > > >> returned
> > > results, not any 'extra' fields use in Filters, Collectors etc.  If
> > > you have a special query that needs to be extra fast, you can change
> > > the fields to load just in the special code for that special query.
> > > >>
> > > >> I hope that helps,
> > > >>
> > > >> -Paul
> > > >>
> > > >>> -----Original Message-----
> > > >>> From: Li Li [mailto:fancyerii@gmail.com] but as l can remember, in
> > > >>> 2.9.x FieldCache can only apply to indexed
> > > but not analyzed fields.
> > > >>
> > > >
> > > > --------------------------------------------------------------------
> > > > - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>

RE: any good idea for loading fields into memory?

Posted by Paul Hill <pa...@metajure.com>.

10 ms vs 40 ms. I'd say so what? 
Is your overall time noticeably effected by this 30 ms gain?  Does the end user notice this 30 ms gain?
Where is the time going?  Just getting the hits?  Getting all documents?  Building result set as your app uses it?

If it is the hits, have you considered searching on a hash value instead of the value of the field?
If it is getting the documents, are you getting too much but only using a little in this particular case?
If it is building the result set, because of need to re-parse, I would look into trying a 2nd multi-valued field with exactly (or closer to) what you need in it.

-Paul

> -----Original Message-----
> From: Li Li [mailto:fancyerii@gmail.com]
> our old map implementation use about 10 ms, while newer one is 40
> ms. the reason is we need to return some fields of all hitted documents. the fields are not very long strings
> and the document number is less than 100k


> 在 2012-6-22 下午5:13，"Danil ŢORIN" <to...@gmail.com>写道：
> 
> > If you can afford it, you could add one additional untokenized stored
> > field that will contain the serialized(one way or another) form of the
> > document.
> >
> > Add FieldCache on top of it, and return it right away.
> >
> > But we are getting into the area where you basically have to keep all
> > your documents in memory.
> >
> > In this situation, maybe it simply doesn't make sense to over
> > complicate things: just keep your index in memory (as it is right now,
> > no additional fields or field caches), and retrieving document would
> > be fast enough simply because all data is in RAM.
> >
> >
> > On Fri, Jun 22, 2012 at 3:56 AM, Li Li <fa...@gmail.com> wrote:
> > > use collector and field cache is a good idea for ranking by certain
> > > field's value.
> > > but I just need to return matched documents' fields. and also field
> > > cache can't store multi-value fields?
> > > I have to store special chars like '\n' to separate them and split
> > > string to string array in runtime.
> > >
> > > On Fri, Jun 22, 2012 at 5:11 AM, Paul Hill <pa...@metajure.com> wrote:
> > >> I would ask the question that if you want to look at the whole
> > >> value of
> > a field during searching, why don't you have a just such a field in
> > your index?
> > >> I have an index with several fields that have 2 versions of the
> > >> field
> > both analyzed and unanalyzed.  It works great for me in 3.x (not 4.x).
> > >> Have you read about Collectors?  That is where I find myself
> > >> working
> > with field caches, but maybe this is not your need. I also properly
> > configured the call to search.doc( docId ) with the second argument,
> > >> so I only automatically load the fields I will be using in my
> > >> returned
> > results, not any 'extra' fields use in Filters, Collectors etc.  If
> > you have a special query that needs to be extra fast, you can change
> > the fields to load just in the special code for that special query.
> > >>
> > >> I hope that helps,
> > >>
> > >> -Paul
> > >>
> > >>> -----Original Message-----
> > >>> From: Li Li [mailto:fancyerii@gmail.com] but as l can remember, in
> > >>> 2.9.x FieldCache can only apply to indexed
> > but not analyzed fields.
> > >>
> > >
> > > --------------------------------------------------------------------
> > > - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >

Re: any good idea for loading fields into memory?

Posted by Li Li <fa...@gmail.com>.

using RAMDIRECTORY is not fast enough because field value need to be
deserialized. I have compared it with hashmap. it is four times slower. our
old map implementation use about 10 ms, while newer one is 40 ms. the
reason is we need to return some fields of all hitted documents. the fields
are not very long strings and the document number is less than 100k
在 2012-6-22 下午5:13，"Danil ŢORIN" <to...@gmail.com>写道：

> If you can afford it, you could add one additional untokenized stored
> field that will contain the serialized(one way or another) form of the
> document.
>
> Add FieldCache on top of it, and return it right away.
>
> But we are getting into the area where you basically have to keep all
> your documents in memory.
>
> In this situation, maybe it simply doesn't make sense to over
> complicate things: just keep your index in memory (as it is right now,
> no additional fields or field caches), and retrieving document would
> be fast enough simply because all data is in RAM.
>
>
> On Fri, Jun 22, 2012 at 3:56 AM, Li Li <fa...@gmail.com> wrote:
> > use collector and field cache is a good idea for ranking by certain
> > field's value.
> > but I just need to return matched documents' fields. and also field
> > cache can't store multi-value fields?
> > I have to store special chars like '\n' to separate them and split
> > string to string array in runtime.
> >
> > On Fri, Jun 22, 2012 at 5:11 AM, Paul Hill <pa...@metajure.com> wrote:
> >> I would ask the question that if you want to look at the whole value of
> a field during searching, why don't you have a just such a field in your
> index?
> >> I have an index with several fields that have 2 versions of the field
> both analyzed and unanalyzed.  It works great for me in 3.x (not 4.x).
> >> Have you read about Collectors?  That is where I find myself working
> with field caches, but maybe this is not your need. I also properly
> configured the call to search.doc( docId ) with the second argument,
> >> so I only automatically load the fields I will be using in my returned
> results, not any 'extra' fields use in Filters, Collectors etc.  If you
> have a special query that needs to be extra fast, you can change the fields
> to load just in the special code for that special query.
> >>
> >> I hope that helps,
> >>
> >> -Paul
> >>
> >>> -----Original Message-----
> >>> From: Li Li [mailto:fancyerii@gmail.com]
> >>> but as l can remember, in 2.9.x FieldCache can only apply to indexed
> but not analyzed fields.
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: any good idea for loading fields into memory?

Posted by Danil ŢORIN <to...@gmail.com>.

If you can afford it, you could add one additional untokenized stored
field that will contain the serialized(one way or another) form of the
document.

Add FieldCache on top of it, and return it right away.

But we are getting into the area where you basically have to keep all
your documents in memory.

In this situation, maybe it simply doesn't make sense to over
complicate things: just keep your index in memory (as it is right now,
no additional fields or field caches), and retrieving document would
be fast enough simply because all data is in RAM.


On Fri, Jun 22, 2012 at 3:56 AM, Li Li <fa...@gmail.com> wrote:
> use collector and field cache is a good idea for ranking by certain
> field's value.
> but I just need to return matched documents' fields. and also field
> cache can't store multi-value fields?
> I have to store special chars like '\n' to separate them and split
> string to string array in runtime.
>
> On Fri, Jun 22, 2012 at 5:11 AM, Paul Hill <pa...@metajure.com> wrote:
>> I would ask the question that if you want to look at the whole value of a field during searching, why don't you have a just such a field in your index?
>> I have an index with several fields that have 2 versions of the field both analyzed and unanalyzed.  It works great for me in 3.x (not 4.x).
>> Have you read about Collectors?  That is where I find myself working with field caches, but maybe this is not your need. I also properly configured the call to search.doc( docId ) with the second argument,
>> so I only automatically load the fields I will be using in my returned results, not any 'extra' fields use in Filters, Collectors etc.  If you have a special query that needs to be extra fast, you can change the fields to load just in the special code for that special query.
>>
>> I hope that helps,
>>
>> -Paul
>>
>>> -----Original Message-----
>>> From: Li Li [mailto:fancyerii@gmail.com]
>>> but as l can remember, in 2.9.x FieldCache can only apply to indexed but not analyzed fields.
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: any good idea for loading fields into memory?

Posted by Li Li <fa...@gmail.com>.

use collector and field cache is a good idea for ranking by certain
field's value.
but I just need to return matched documents' fields. and also field
cache can't store multi-value fields?
I have to store special chars like '\n' to separate them and split
string to string array in runtime.

On Fri, Jun 22, 2012 at 5:11 AM, Paul Hill <pa...@metajure.com> wrote:
> I would ask the question that if you want to look at the whole value of a field during searching, why don't you have a just such a field in your index?
> I have an index with several fields that have 2 versions of the field both analyzed and unanalyzed.  It works great for me in 3.x (not 4.x).
> Have you read about Collectors?  That is where I find myself working with field caches, but maybe this is not your need. I also properly configured the call to search.doc( docId ) with the second argument,
> so I only automatically load the fields I will be using in my returned results, not any 'extra' fields use in Filters, Collectors etc.  If you have a special query that needs to be extra fast, you can change the fields to load just in the special code for that special query.
>
> I hope that helps,
>
> -Paul
>
>> -----Original Message-----
>> From: Li Li [mailto:fancyerii@gmail.com]
>> but as l can remember, in 2.9.x FieldCache can only apply to indexed but not analyzed fields.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: any good idea for loading fields into memory?

Posted by Paul Hill <pa...@metajure.com>.

I would ask the question that if you want to look at the whole value of a field during searching, why don't you have a just such a field in your index?
I have an index with several fields that have 2 versions of the field both analyzed and unanalyzed.  It works great for me in 3.x (not 4.x).
Have you read about Collectors?  That is where I find myself working with field caches, but maybe this is not your need. I also properly configured the call to search.doc( docId ) with the second argument,
so I only automatically load the fields I will be using in my returned results, not any 'extra' fields use in Filters, Collectors etc.  If you have a special query that needs to be extra fast, you can change the fields to load just in the special code for that special query.

I hope that helps,

-Paul

> -----Original Message-----
> From: Li Li [mailto:fancyerii@gmail.com]
> but as l can remember, in 2.9.x FieldCache can only apply to indexed but not analyzed fields.

Re: any good idea for loading fields into memory?

Posted by Li Li <fa...@gmail.com>.

I can't use 4.0 because it's not released. our company require to use
stable version.

So I decide to wrapper an IndexSearcher with fields' values in memory like this:
and I copy all the codes of org.apache.lucene.search.SearcherManager.
replace IndexSearcher with
my IndexSearcherWithFields.

any suggestion for this solution?

public class IndexSearcherWithFields {
	protected static Logger logger =
Logger.getLogger(IndexSearcherWithFields.class);
	private Collection<String> inMemoryFields;
	private Collection<String> inMemoryMultiValueFields;
	private Map<String,Object[]> fieldsValues=new HashMap<String,Object[]>();
	private IndexSearcher searcher;
	
	public IndexReader getIndexReader(){
		return searcher.getIndexReader();
	}
	
	public IndexSearcherWithFields(IndexSearcher
searcher,Collection<String> inMemoryFields
			,Collection<String> inMemoryMultiValueFields) throws IOException{
		this.searcher=searcher;
		this.inMemoryFields=inMemoryFields;
		this.inMemoryMultiValueFields=inMemoryMultiValueFields;
		this.warmup();
	}
	
	public final IndexSearcher getSearcher(){
		return searcher;
	}
	
	public Object[] getField(String fn){		
		return fieldsValues.get(fn);
	}
	
	private void warmup() throws IOException{
		long start=System.currentTimeMillis();
		IndexReader reader=searcher.getIndexReader();
		int docSize=reader.maxDoc();
		for(String f:inMemoryFields){
			Object[] arr=new Object[docSize];
			fieldsValues.put(f, arr);
		}
		for(String f:inMemoryMultiValueFields){
			Object[] arr=new Object[docSize];
			fieldsValues.put(f, arr);
		}
		
		for(int i=0;i<docSize;i++){
			Document doc=reader.document(i);
			for(String f:inMemoryFields){
				Object[] arr=fieldsValues.get(f);
				arr[i]=doc.get(f);
			}
			
			for(String f:inMemoryMultiValueFields){
				Object[] arr=fieldsValues.get(f);
				arr[i]=doc.getValues(f);
			}
		}
		logger.debug("warm up fields time:
"+(System.currentTimeMillis()-start)+" ms.");
	}
}


On Wed, Jun 20, 2012 at 11:37 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> Right, the field must have a single token for FieldCache.
>
> But if you are on 4.x you can use DocTermOrds
> (FieldCache.getDocTermOrds) which allows for multiple tokens per
> field.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Jun 20, 2012 at 9:47 AM, Li Li <fa...@gmail.com> wrote:
>> but as l can remember, in 2.9.x FieldCache can only apply to indexed but
>> not analyzed fields.
>> 在 2012-6-20 晚上8:59，"Danil ŢORIN" <to...@gmail.com>写道：
>>
>>> I think you are looking for FieldCache.
>>>
>>> I'm not sure the current status in 4x, but it worked in 2.9/3.x.
>>> Basically it's an array, so access is quite straight forward, and the
>>> best part IndexReader manage those for you, so on reopen only new
>>> segments are read.
>>>
>>> Small catch is that FiledCaches are per segment, so you need to be
>>> careful if you want to retrieve data using global document ids.
>>> However if you are building result set in your own Collector, using
>>> FieldCache is quite straight forward.
>>>
>>>
>>> On Wed, Jun 20, 2012 at 3:49 PM, Li Li <fa...@gmail.com> wrote:
>>> > hi all
>>> >    I need to return certain fields of all matched documents quickly.
>>> > I am now using Document.get(field), but the performance is not well
>>> > enough. Originally I use HashMap to store these fields. it's much
>>> > faster but I have to maintain two storage systems. Now I am
>>> > reconstructing this project. I want to store everything in lucene.
>>> >    when I use an IndexSearcher to perform searching, I can get
>>> > related fields by docID. it must thread safe. And like the IndexReader
>>> > it's a snapshot of the index
>>> >    Here are some solutions I can come up with:
>>> >    1. StringIndex
>>> >       I have considered StringIndex but some fields need to tokenize.
>>> > maybe I can use two fields, one is tokenized for searching. Another is
>>> > indexed but not analyzed, the later one is only used for StringIndex.
>>> > If there is any better solution, maybe I have to use this one.
>>> >    2. Associating a Map with each IndexReader
>>> >       when the IndexReader is opened or reopened, I need to iterate
>>> > through each documents of this Reader and put everything into a map.
>>> > The problem is it's slower and I don't know whether it's problematic
>>> > with NRT.
>>> >
>>> >    is there any other better solution? thanks.
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: any good idea for loading fields into memory?

Posted by Michael McCandless <lu...@mikemccandless.com>.

Right, the field must have a single token for FieldCache.

But if you are on 4.x you can use DocTermOrds
(FieldCache.getDocTermOrds) which allows for multiple tokens per
field.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jun 20, 2012 at 9:47 AM, Li Li <fa...@gmail.com> wrote:
> but as l can remember, in 2.9.x FieldCache can only apply to indexed but
> not analyzed fields.
> 在 2012-6-20 晚上8:59，"Danil ŢORIN" <to...@gmail.com>写道：
>
>> I think you are looking for FieldCache.
>>
>> I'm not sure the current status in 4x, but it worked in 2.9/3.x.
>> Basically it's an array, so access is quite straight forward, and the
>> best part IndexReader manage those for you, so on reopen only new
>> segments are read.
>>
>> Small catch is that FiledCaches are per segment, so you need to be
>> careful if you want to retrieve data using global document ids.
>> However if you are building result set in your own Collector, using
>> FieldCache is quite straight forward.
>>
>>
>> On Wed, Jun 20, 2012 at 3:49 PM, Li Li <fa...@gmail.com> wrote:
>> > hi all
>> >    I need to return certain fields of all matched documents quickly.
>> > I am now using Document.get(field), but the performance is not well
>> > enough. Originally I use HashMap to store these fields. it's much
>> > faster but I have to maintain two storage systems. Now I am
>> > reconstructing this project. I want to store everything in lucene.
>> >    when I use an IndexSearcher to perform searching, I can get
>> > related fields by docID. it must thread safe. And like the IndexReader
>> > it's a snapshot of the index
>> >    Here are some solutions I can come up with:
>> >    1. StringIndex
>> >       I have considered StringIndex but some fields need to tokenize.
>> > maybe I can use two fields, one is tokenized for searching. Another is
>> > indexed but not analyzed, the later one is only used for StringIndex.
>> > If there is any better solution, maybe I have to use this one.
>> >    2. Associating a Map with each IndexReader
>> >       when the IndexReader is opened or reopened, I need to iterate
>> > through each documents of this Reader and put everything into a map.
>> > The problem is it's slower and I don't know whether it's problematic
>> > with NRT.
>> >
>> >    is there any other better solution? thanks.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: any good idea for loading fields into memory?

Posted by Li Li <fa...@gmail.com>.

but as l can remember, in 2.9.x FieldCache can only apply to indexed but
not analyzed fields.
在 2012-6-20 晚上8:59，"Danil ŢORIN" <to...@gmail.com>写道：

> I think you are looking for FieldCache.
>
> I'm not sure the current status in 4x, but it worked in 2.9/3.x.
> Basically it's an array, so access is quite straight forward, and the
> best part IndexReader manage those for you, so on reopen only new
> segments are read.
>
> Small catch is that FiledCaches are per segment, so you need to be
> careful if you want to retrieve data using global document ids.
> However if you are building result set in your own Collector, using
> FieldCache is quite straight forward.
>
>
> On Wed, Jun 20, 2012 at 3:49 PM, Li Li <fa...@gmail.com> wrote:
> > hi all
> >    I need to return certain fields of all matched documents quickly.
> > I am now using Document.get(field), but the performance is not well
> > enough. Originally I use HashMap to store these fields. it's much
> > faster but I have to maintain two storage systems. Now I am
> > reconstructing this project. I want to store everything in lucene.
> >    when I use an IndexSearcher to perform searching, I can get
> > related fields by docID. it must thread safe. And like the IndexReader
> > it's a snapshot of the index
> >    Here are some solutions I can come up with:
> >    1. StringIndex
> >       I have considered StringIndex but some fields need to tokenize.
> > maybe I can use two fields, one is tokenized for searching. Another is
> > indexed but not analyzed, the later one is only used for StringIndex.
> > If there is any better solution, maybe I have to use this one.
> >    2. Associating a Map with each IndexReader
> >       when the IndexReader is opened or reopened, I need to iterate
> > through each documents of this Reader and put everything into a map.
> > The problem is it's slower and I don't know whether it's problematic
> > with NRT.
> >
> >    is there any other better solution? thanks.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: any good idea for loading fields into memory?

Posted by Danil ŢORIN <to...@gmail.com>.

I think you are looking for FieldCache.

I'm not sure the current status in 4x, but it worked in 2.9/3.x.
Basically it's an array, so access is quite straight forward, and the
best part IndexReader manage those for you, so on reopen only new
segments are read.

Small catch is that FiledCaches are per segment, so you need to be
careful if you want to retrieve data using global document ids.
However if you are building result set in your own Collector, using
FieldCache is quite straight forward.


On Wed, Jun 20, 2012 at 3:49 PM, Li Li <fa...@gmail.com> wrote:
> hi all
>    I need to return certain fields of all matched documents quickly.
> I am now using Document.get(field), but the performance is not well
> enough. Originally I use HashMap to store these fields. it's much
> faster but I have to maintain two storage systems. Now I am
> reconstructing this project. I want to store everything in lucene.
>    when I use an IndexSearcher to perform searching, I can get
> related fields by docID. it must thread safe. And like the IndexReader
> it's a snapshot of the index
>    Here are some solutions I can come up with:
>    1. StringIndex
>       I have considered StringIndex but some fields need to tokenize.
> maybe I can use two fields, one is tokenized for searching. Another is
> indexed but not analyzed, the later one is only used for StringIndex.
> If there is any better solution, maybe I have to use this one.
>    2. Associating a Map with each IndexReader
>       when the IndexReader is opened or reopened, I need to iterate
> through each documents of this Reader and put everything into a map.
> The problem is it's slower and I don't know whether it's problematic
> with NRT.
>
>    is there any other better solution? thanks.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org