You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Yonghui Zhao <zh...@gmail.com> on 2018/04/12 11:54:36 UTC

what's replacement of FieldCache in Lucene 7

Hi,

I am upgrading my project from Lucene 4 to 7.

FieldCache is removed in lucene 7,  DocValue is replacement?

But seems DocValue doesn't support random access.

I need random access to get some specified field value quickly.

So how to solve it?

Re: what's replacement of FieldCache in Lucene 7

Posted by Yonghui Zhao <zh...@gmail.com>.
Got it, make sense.   Thanks Adrien.

2018-04-13 19:16 GMT+08:00 Adrien Grand <jp...@gmail.com>:

> Queries should be fine: they are required to produce sorted iterators since
> 5.0 when we removed the accetDocsOutOfOrder option on collectors.
>
> Le ven. 13 avr. 2018 à 13:10, Yonghui Zhao <zh...@gmail.com> a
> écrit :
>
> > I can sort doc id and then fetch field via docvalue.
> >
> > but another big scenario for field cache is  in custom score query, we
> use
> > field cache to compute score, stored fields can't work here for
> performance
> > issue.
> >
> > If I still use docvalue, I must make sure all queries are scored in
> order,
> > I think this will introduce some performance drop?
> >
> > 2018-04-13 17:15 GMT+08:00 Adrien Grand <jp...@gmail.com>:
> >
> > > Performance may be worse with stored fields indeed. In general Lucene
> > makes
> > > the assumption that millions of documents are queried but only ~100
> > > documents are retrieved in the end, so the bottleneck should be query
> > > processing, not retrieving stored fieds.
> > >
> > > Le ven. 13 avr. 2018 à 05:27, Yonghui Zhao <zh...@gmail.com> a
> > > écrit :
> > >
> > > > My case is when I get some docs from lucene, I need also get some
> field
> > > > value of the retrieved docs.
> > > >
> > > > For example  in lucene 4, I use FieldCache like this.
> > > >
> > > > FieldCache.DEFAULT.getTerms(reader, name,
> > > > false).get(locDocId).utf8ToString();
> > > >
> > > > FieldCache.DEFAULT.getInts(reader, name, false).get(locDocId);
> > > >
> > > > FieldCache.DEFAULT.getDoubles(reader, name, false).get(locDocId);
> > > >
> > > >
> > > > while docId may be not in ascending order.
> > > >
> > > > Of course I can use stored field like this
> > > >
> > > > Document doc = indexSearcher.doc(docId, storedFields.keySet());
> > > >
> > > >
> > > > But the performance should be worse than FieldCache.
> > > >
> > > >
> > > > 2018-04-12 19:57 GMT+08:00 Adrien Grand <jp...@gmail.com>:
> > > >
> > > > > Hello,
> > > > >
> > > > > Doc values should be used instead of the field cache indeed. Note
> > that
> > > > this
> > > > > require to add them to your documents at index time, eg. with a
> > > > > NumericDocValuesField.
> > > > >
> > > > > Regarding random access, maybe you can use the advanceExact API,
> > which
> > > > > exists on all doc-value iterators. Just make sure to never call it
> on
> > > > > decreasing doc IDs. If that doesn't work for you, can you describe
> > you
> > > > > use-case, maybe there are better ways to implement what you need.
> > > > >
> > > > > Le jeu. 12 avr. 2018 à 13:54, Yonghui Zhao <zh...@gmail.com>
> a
> > > > > écrit :
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am upgrading my project from Lucene 4 to 7.
> > > > > >
> > > > > > FieldCache is removed in lucene 7,  DocValue is replacement?
> > > > > >
> > > > > > But seems DocValue doesn't support random access.
> > > > > >
> > > > > > I need random access to get some specified field value quickly.
> > > > > >
> > > > > > So how to solve it?
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: what's replacement of FieldCache in Lucene 7

Posted by Adrien Grand <jp...@gmail.com>.
Queries should be fine: they are required to produce sorted iterators since
5.0 when we removed the accetDocsOutOfOrder option on collectors.

Le ven. 13 avr. 2018 à 13:10, Yonghui Zhao <zh...@gmail.com> a écrit :

> I can sort doc id and then fetch field via docvalue.
>
> but another big scenario for field cache is  in custom score query, we use
> field cache to compute score, stored fields can't work here for performance
> issue.
>
> If I still use docvalue, I must make sure all queries are scored in order,
> I think this will introduce some performance drop?
>
> 2018-04-13 17:15 GMT+08:00 Adrien Grand <jp...@gmail.com>:
>
> > Performance may be worse with stored fields indeed. In general Lucene
> makes
> > the assumption that millions of documents are queried but only ~100
> > documents are retrieved in the end, so the bottleneck should be query
> > processing, not retrieving stored fieds.
> >
> > Le ven. 13 avr. 2018 à 05:27, Yonghui Zhao <zh...@gmail.com> a
> > écrit :
> >
> > > My case is when I get some docs from lucene, I need also get some field
> > > value of the retrieved docs.
> > >
> > > For example  in lucene 4, I use FieldCache like this.
> > >
> > > FieldCache.DEFAULT.getTerms(reader, name,
> > > false).get(locDocId).utf8ToString();
> > >
> > > FieldCache.DEFAULT.getInts(reader, name, false).get(locDocId);
> > >
> > > FieldCache.DEFAULT.getDoubles(reader, name, false).get(locDocId);
> > >
> > >
> > > while docId may be not in ascending order.
> > >
> > > Of course I can use stored field like this
> > >
> > > Document doc = indexSearcher.doc(docId, storedFields.keySet());
> > >
> > >
> > > But the performance should be worse than FieldCache.
> > >
> > >
> > > 2018-04-12 19:57 GMT+08:00 Adrien Grand <jp...@gmail.com>:
> > >
> > > > Hello,
> > > >
> > > > Doc values should be used instead of the field cache indeed. Note
> that
> > > this
> > > > require to add them to your documents at index time, eg. with a
> > > > NumericDocValuesField.
> > > >
> > > > Regarding random access, maybe you can use the advanceExact API,
> which
> > > > exists on all doc-value iterators. Just make sure to never call it on
> > > > decreasing doc IDs. If that doesn't work for you, can you describe
> you
> > > > use-case, maybe there are better ways to implement what you need.
> > > >
> > > > Le jeu. 12 avr. 2018 à 13:54, Yonghui Zhao <zh...@gmail.com> a
> > > > écrit :
> > > >
> > > > > Hi,
> > > > >
> > > > > I am upgrading my project from Lucene 4 to 7.
> > > > >
> > > > > FieldCache is removed in lucene 7,  DocValue is replacement?
> > > > >
> > > > > But seems DocValue doesn't support random access.
> > > > >
> > > > > I need random access to get some specified field value quickly.
> > > > >
> > > > > So how to solve it?
> > > > >
> > > >
> > >
> >
>

Re: what's replacement of FieldCache in Lucene 7

Posted by Yonghui Zhao <zh...@gmail.com>.
I can sort doc id and then fetch field via docvalue.

but another big scenario for field cache is  in custom score query, we use
field cache to compute score, stored fields can't work here for performance
issue.

If I still use docvalue, I must make sure all queries are scored in order,
I think this will introduce some performance drop?

2018-04-13 17:15 GMT+08:00 Adrien Grand <jp...@gmail.com>:

> Performance may be worse with stored fields indeed. In general Lucene makes
> the assumption that millions of documents are queried but only ~100
> documents are retrieved in the end, so the bottleneck should be query
> processing, not retrieving stored fieds.
>
> Le ven. 13 avr. 2018 à 05:27, Yonghui Zhao <zh...@gmail.com> a
> écrit :
>
> > My case is when I get some docs from lucene, I need also get some field
> > value of the retrieved docs.
> >
> > For example  in lucene 4, I use FieldCache like this.
> >
> > FieldCache.DEFAULT.getTerms(reader, name,
> > false).get(locDocId).utf8ToString();
> >
> > FieldCache.DEFAULT.getInts(reader, name, false).get(locDocId);
> >
> > FieldCache.DEFAULT.getDoubles(reader, name, false).get(locDocId);
> >
> >
> > while docId may be not in ascending order.
> >
> > Of course I can use stored field like this
> >
> > Document doc = indexSearcher.doc(docId, storedFields.keySet());
> >
> >
> > But the performance should be worse than FieldCache.
> >
> >
> > 2018-04-12 19:57 GMT+08:00 Adrien Grand <jp...@gmail.com>:
> >
> > > Hello,
> > >
> > > Doc values should be used instead of the field cache indeed. Note that
> > this
> > > require to add them to your documents at index time, eg. with a
> > > NumericDocValuesField.
> > >
> > > Regarding random access, maybe you can use the advanceExact API, which
> > > exists on all doc-value iterators. Just make sure to never call it on
> > > decreasing doc IDs. If that doesn't work for you, can you describe you
> > > use-case, maybe there are better ways to implement what you need.
> > >
> > > Le jeu. 12 avr. 2018 à 13:54, Yonghui Zhao <zh...@gmail.com> a
> > > écrit :
> > >
> > > > Hi,
> > > >
> > > > I am upgrading my project from Lucene 4 to 7.
> > > >
> > > > FieldCache is removed in lucene 7,  DocValue is replacement?
> > > >
> > > > But seems DocValue doesn't support random access.
> > > >
> > > > I need random access to get some specified field value quickly.
> > > >
> > > > So how to solve it?
> > > >
> > >
> >
>

Re: what's replacement of FieldCache in Lucene 7

Posted by Adrien Grand <jp...@gmail.com>.
Performance may be worse with stored fields indeed. In general Lucene makes
the assumption that millions of documents are queried but only ~100
documents are retrieved in the end, so the bottleneck should be query
processing, not retrieving stored fieds.

Le ven. 13 avr. 2018 à 05:27, Yonghui Zhao <zh...@gmail.com> a écrit :

> My case is when I get some docs from lucene, I need also get some field
> value of the retrieved docs.
>
> For example  in lucene 4, I use FieldCache like this.
>
> FieldCache.DEFAULT.getTerms(reader, name,
> false).get(locDocId).utf8ToString();
>
> FieldCache.DEFAULT.getInts(reader, name, false).get(locDocId);
>
> FieldCache.DEFAULT.getDoubles(reader, name, false).get(locDocId);
>
>
> while docId may be not in ascending order.
>
> Of course I can use stored field like this
>
> Document doc = indexSearcher.doc(docId, storedFields.keySet());
>
>
> But the performance should be worse than FieldCache.
>
>
> 2018-04-12 19:57 GMT+08:00 Adrien Grand <jp...@gmail.com>:
>
> > Hello,
> >
> > Doc values should be used instead of the field cache indeed. Note that
> this
> > require to add them to your documents at index time, eg. with a
> > NumericDocValuesField.
> >
> > Regarding random access, maybe you can use the advanceExact API, which
> > exists on all doc-value iterators. Just make sure to never call it on
> > decreasing doc IDs. If that doesn't work for you, can you describe you
> > use-case, maybe there are better ways to implement what you need.
> >
> > Le jeu. 12 avr. 2018 à 13:54, Yonghui Zhao <zh...@gmail.com> a
> > écrit :
> >
> > > Hi,
> > >
> > > I am upgrading my project from Lucene 4 to 7.
> > >
> > > FieldCache is removed in lucene 7,  DocValue is replacement?
> > >
> > > But seems DocValue doesn't support random access.
> > >
> > > I need random access to get some specified field value quickly.
> > >
> > > So how to solve it?
> > >
> >
>

Re: what's replacement of FieldCache in Lucene 7

Posted by Yonghui Zhao <zh...@gmail.com>.
My case is when I get some docs from lucene, I need also get some field
value of the retrieved docs.

For example  in lucene 4, I use FieldCache like this.

FieldCache.DEFAULT.getTerms(reader, name, false).get(locDocId).utf8ToString();

FieldCache.DEFAULT.getInts(reader, name, false).get(locDocId);

FieldCache.DEFAULT.getDoubles(reader, name, false).get(locDocId);


while docId may be not in ascending order.

Of course I can use stored field like this

Document doc = indexSearcher.doc(docId, storedFields.keySet());


But the performance should be worse than FieldCache.


2018-04-12 19:57 GMT+08:00 Adrien Grand <jp...@gmail.com>:

> Hello,
>
> Doc values should be used instead of the field cache indeed. Note that this
> require to add them to your documents at index time, eg. with a
> NumericDocValuesField.
>
> Regarding random access, maybe you can use the advanceExact API, which
> exists on all doc-value iterators. Just make sure to never call it on
> decreasing doc IDs. If that doesn't work for you, can you describe you
> use-case, maybe there are better ways to implement what you need.
>
> Le jeu. 12 avr. 2018 à 13:54, Yonghui Zhao <zh...@gmail.com> a
> écrit :
>
> > Hi,
> >
> > I am upgrading my project from Lucene 4 to 7.
> >
> > FieldCache is removed in lucene 7,  DocValue is replacement?
> >
> > But seems DocValue doesn't support random access.
> >
> > I need random access to get some specified field value quickly.
> >
> > So how to solve it?
> >
>

Re: what's replacement of FieldCache in Lucene 7

Posted by Adrien Grand <jp...@gmail.com>.
Hello,

Doc values should be used instead of the field cache indeed. Note that this
require to add them to your documents at index time, eg. with a
NumericDocValuesField.

Regarding random access, maybe you can use the advanceExact API, which
exists on all doc-value iterators. Just make sure to never call it on
decreasing doc IDs. If that doesn't work for you, can you describe you
use-case, maybe there are better ways to implement what you need.

Le jeu. 12 avr. 2018 à 13:54, Yonghui Zhao <zh...@gmail.com> a écrit :

> Hi,
>
> I am upgrading my project from Lucene 4 to 7.
>
> FieldCache is removed in lucene 7,  DocValue is replacement?
>
> But seems DocValue doesn't support random access.
>
> I need random access to get some specified field value quickly.
>
> So how to solve it?
>