You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adam H." <ji...@gmail.com> on 2010/12/06 00:12:15 UTC

FieldCache usage for custom field collapse in solr 1.4

Hey,
I'm trying to use the lucene FieldCache for some custom field collapsing
implementation: basically i'm collapsing on a non-stored field,
and so am using the fieldcache to retrieve field value instances during run.

I noticed I'm getting some OOM's after deploying it, and after looking into
it for abit, figured that it might be to do with using a call like this:

StringIndex fieldCacheVals = FieldCache.DEFAULT.getStringIndex(reader,
collapseField);

where 'reader' is the instance of the SolrIndexReader passed along to the
component with the ResponseBuilder.SolrQueryRequest object.

As I understand, this can double memory usage due to (re)loading this
fieldcache on a reader-wide basis rather than on a per segment basis?
If so, what would be a way to migrate this code to use a per segment cache?
i'm not sure I understand the semantics there at all...

Any help will be greatly appreciated, thanks alot!

Adam

Re: FieldCache usage for custom field collapse in solr 1.4

Posted by "Adam H." <ji...@gmail.com>.
One more comment/question -
Having looked at the Solr stats panel, I do not see detailed memory usage
for the field i'm collapsing on in the lucene FieldCache entries listings.

As I understand ( after having looked through this ticket:
https://issues.apache.org/jira/browse/SOLR-1292 ), this means that its not
an 'insanity' instance,
and so actually I am not using double the memory, but rather only have this
field in the FieldCache on the whole index level.

This got me thinking - If i'm not using any segment-level fieldcaching for
this field, there's no reason not to use an index-wide one,
as long as I can guarantee thats the only use case for this field in the
fieldcache.. is this correct?

Thanks again for helping me out with this delicate subject :)

Adam

On Mon, Dec 6, 2010 at 3:21 PM, Adam H. <ji...@gmail.com> wrote:

> ah! so just so I can get cracking on this - Can you be alittle more
> specific? e.g
>
> in my component implementation that runs in the request handling after the
> normal QueryComponent,
> How would I access the specific field value for the documents that were
> retrieved?
>
> i.e how would it fit in a code like this if at all:
>
> // docList is the matching documents for given offset/rows/query
> DocIterator it = docList.iterator();
>
>         while (it.hasNext()) {
>             docId = it.next();
>             score = it.score();
>
>
>             // this would've worked if this was stored field:
>             // reader.document(docId).get(fieldName)
>             ??
>
>         }
>
>
>
> On Mon, Dec 6, 2010 at 2:57 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:
>
>> On Mon, Dec 6, 2010 at 5:48 PM, Adam H. <ji...@gmail.com> wrote:
>> > In other words, using a per-segment fieldcache collection as a
>> > post-processing step (e.g after QueryComponent did its collection) does
>> not
>> > seem at all trivial, if at all possible ( is it possible? )
>>
>> Sure, it's possible, and not too hard (as long as no sort field involves
>> score).
>> Just instruct the QueryComponent to retrieve the set of all matching
>> documents, then you can use that to run then through whatever
>> collectors you want again.  I've been meaning to implement this
>> optimization to field collapsing...
>>
>> Depending on the details, either replacing the QueryComponent with
>> your custom one, or inserting an additional component after the query
>> component could make sense.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>

Re: FieldCache usage for custom field collapse in solr 1.4

Posted by "Adam H." <ji...@gmail.com>.
ah! so just so I can get cracking on this - Can you be alittle more
specific? e.g

in my component implementation that runs in the request handling after the
normal QueryComponent,
How would I access the specific field value for the documents that were
retrieved?

i.e how would it fit in a code like this if at all:

// docList is the matching documents for given offset/rows/query
DocIterator it = docList.iterator();

        while (it.hasNext()) {
            docId = it.next();
            score = it.score();


            // this would've worked if this was stored field:
            // reader.document(docId).get(fieldName)
            ??
        }



On Mon, Dec 6, 2010 at 2:57 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Mon, Dec 6, 2010 at 5:48 PM, Adam H. <ji...@gmail.com> wrote:
> > In other words, using a per-segment fieldcache collection as a
> > post-processing step (e.g after QueryComponent did its collection) does
> not
> > seem at all trivial, if at all possible ( is it possible? )
>
> Sure, it's possible, and not too hard (as long as no sort field involves
> score).
> Just instruct the QueryComponent to retrieve the set of all matching
> documents, then you can use that to run then through whatever
> collectors you want again.  I've been meaning to implement this
> optimization to field collapsing...
>
> Depending on the details, either replacing the QueryComponent with
> your custom one, or inserting an additional component after the query
> component could make sense.
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: FieldCache usage for custom field collapse in solr 1.4

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Dec 6, 2010 at 5:48 PM, Adam H. <ji...@gmail.com> wrote:
> In other words, using a per-segment fieldcache collection as a
> post-processing step (e.g after QueryComponent did its collection) does not
> seem at all trivial, if at all possible ( is it possible? )

Sure, it's possible, and not too hard (as long as no sort field involves score).
Just instruct the QueryComponent to retrieve the set of all matching
documents, then you can use that to run then through whatever
collectors you want again.  I've been meaning to implement this
optimization to field collapsing...

Depending on the details, either replacing the QueryComponent with
your custom one, or inserting an additional component after the query
component could make sense.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: FieldCache usage for custom field collapse in solr 1.4

Posted by "Adam H." <ji...@gmail.com>.
So,
summing up all the information i now have, and the fact I have some
additional custom components that use fieldcache,
such that the specific answer for field collapsing by migrating to solr 4.0
is not a complete solution to my problems,

it seems to me more and more like I might have to actually implement a
custom solr QueryComponent, whereby I will pass it
multiple collectors (perhaps via some kind of MultiCollector interface,
similar to Grouping uses) which will do their appropriate field value
collection/aggregation
as results are being fetched.

In other words, using a per-segment fieldcache collection as a
post-processing step (e.g after QueryComponent did its collection) does not
seem at all trivial, if at all possible ( is it possible? )
Is this accurate?

Thanks again for all the info here..

Adam

On Mon, Dec 6, 2010 at 1:48 PM, Ryan McKinley <ry...@gmail.com> wrote:

> On Mon, Dec 6, 2010 at 4:02 PM, Yonik Seeley <yo...@lucidimagination.com>
> wrote:
> > On Mon, Dec 6, 2010 at 3:41 PM, Adam H. <ji...@gmail.com> wrote:
> >> Fair enough - I might give it a shot if most functionality is compatible
> to
> >> solr 1.4.1 to your mind? and is fairly stable?
> >
> > Yes, the external APIs are very compatible.
> > The internal APIs - not so much.
> > You should reindex also.
>
> And not be (too) surprised if things change before the official 4.x
> release -- the chances are good that something will change that may
> require reindexing.
>
> ryan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: FieldCache usage for custom field collapse in solr 1.4

Posted by Ryan McKinley <ry...@gmail.com>.
On Mon, Dec 6, 2010 at 4:02 PM, Yonik Seeley <yo...@lucidimagination.com> wrote:
> On Mon, Dec 6, 2010 at 3:41 PM, Adam H. <ji...@gmail.com> wrote:
>> Fair enough - I might give it a shot if most functionality is compatible to
>> solr 1.4.1 to your mind? and is fairly stable?
>
> Yes, the external APIs are very compatible.
> The internal APIs - not so much.
> You should reindex also.

And not be (too) surprised if things change before the official 4.x
release -- the chances are good that something will change that may
require reindexing.

ryan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: FieldCache usage for custom field collapse in solr 1.4

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Dec 6, 2010 at 3:41 PM, Adam H. <ji...@gmail.com> wrote:
> Fair enough - I might give it a shot if most functionality is compatible to
> solr 1.4.1 to your mind? and is fairly stable?

Yes, the external APIs are very compatible.
The internal APIs - not so much.
You should reindex also.

> One last Q regarding correct usage of per-segment FieldCache in Solr
> components -
>
> since this is something I might also have issues with elsewhere, and I
> suspect other people who work on custom logic as well,
> i think it might be useful to have some documentation and/or a simple
> programmatic interface for implementing
> correct access path to these inside a custom SolrComponent.
>
> I looked around the Grouping code abit and have yet to fully understand
> whats going on, but is the ValueSource supposed to take care of access to
> underlying field?

Yes - you can actually group on arbitrary function queries even.
That will be more useful when we add some bucketing functions.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: FieldCache usage for custom field collapse in solr 1.4

Posted by "Adam H." <ji...@gmail.com>.
Fair enough - I might give it a shot if most functionality is compatible to
solr 1.4.1 to your mind? and is fairly stable?

One last Q regarding correct usage of per-segment FieldCache in Solr
components -

since this is something I might also have issues with elsewhere, and I
suspect other people who work on custom logic as well,
i think it might be useful to have some documentation and/or a simple
programmatic interface for implementing
correct access path to these inside a custom SolrComponent.

I looked around the Grouping code abit and have yet to fully understand
whats going on, but is the ValueSource supposed to take care of access to
underlying field?

On Mon, Dec 6, 2010 at 12:34 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Mon, Dec 6, 2010 at 3:24 PM, Adam H. <ji...@gmail.com> wrote:
> > Hey Yonik.
> > Thanks for clarifying.
> > The reason I went rolling my own way - I asked previously is there's any
> > plan to back-port the field collapse to solr 1.4 and
> > I understood that its not at all straight forward.
>
> Ahhh... I'd just use trunk if possible ;-)
>
> The risks to being in production on custom code that no one else uses
> is perhaps greater than running on a widely used development version.
>
> But yes... I don't see a backport happening for 1.4
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: FieldCache usage for custom field collapse in solr 1.4

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Dec 6, 2010 at 3:24 PM, Adam H. <ji...@gmail.com> wrote:
> Hey Yonik.
> Thanks for clarifying.
> The reason I went rolling my own way - I asked previously is there's any
> plan to back-port the field collapse to solr 1.4 and
> I understood that its not at all straight forward.

Ahhh... I'd just use trunk if possible ;-)

The risks to being in production on custom code that no one else uses
is perhaps greater than running on a widely used development version.

But yes... I don't see a backport happening for 1.4

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: FieldCache usage for custom field collapse in solr 1.4

Posted by "Adam H." <ji...@gmail.com>.
Hey Yonik.
Thanks for clarifying.
The reason I went rolling my own way - I asked previously is there's any
plan to back-port the field collapse to solr 1.4 and
I understood that its not at all straight forward.

If you think it'll be fairly easy to look at the new code in Solr 4.0 trunk
and use that as basis for example I'd go ahead and do that.

Q - does the field collapse componet expect the field to collapse on to be
stored? or does it also try to use field cache trickery?

Thanks,
Adam

On Mon, Dec 6, 2010 at 9:42 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Sun, Dec 5, 2010 at 6:12 PM, Adam H. <ji...@gmail.com> wrote:
> > StringIndex fieldCacheVals = FieldCache.DEFAULT.getStringIndex(reader,
> > collapseField);
> >
> > where 'reader' is the instance of the SolrIndexReader passed along to the
> > component with the ResponseBuilder.SolrQueryRequest object.
> >
> > As I understand, this can double memory usage due to (re)loading this
> > fieldcache on a reader-wide basis rather than on a per segment basis?
>
> Yep.  Sorting and function queries use per-segment FieldCache entries.
> So If you also request a FieldCache from the top level reader, it
> won't reuse the per-segment caches and hence will take up 2x memory
> over just using per-segment.
>
> Solr's field collapsing already works on a per-segment basis... if
> your needs are at all general, it could make sense to try and get it
> rolled into solr rather than implementing custom code.
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: FieldCache usage for custom field collapse in solr 1.4

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Sun, Dec 5, 2010 at 6:12 PM, Adam H. <ji...@gmail.com> wrote:
> StringIndex fieldCacheVals = FieldCache.DEFAULT.getStringIndex(reader,
> collapseField);
>
> where 'reader' is the instance of the SolrIndexReader passed along to the
> component with the ResponseBuilder.SolrQueryRequest object.
>
> As I understand, this can double memory usage due to (re)loading this
> fieldcache on a reader-wide basis rather than on a per segment basis?

Yep.  Sorting and function queries use per-segment FieldCache entries.
So If you also request a FieldCache from the top level reader, it
won't reuse the per-segment caches and hence will take up 2x memory
over just using per-segment.

Solr's field collapsing already works on a per-segment basis... if
your needs are at all general, it could make sense to try and get it
rolled into solr rather than implementing custom code.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org