You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-dev@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2006/05/10 19:08:34 UTC

request handler and caches

I build a "facet" cache in my request handler, but I need it to get  
refreshed when the index changes.  How can my custom request handler  
manage this cache and get notified when the index changes?

Thanks,
	Erik

Re: request handler and caches

Posted by Chris Hostetter <ho...@fucit.org>.

I almost forgot ... if/when you want to "apply" some of those facets to a
query provided by your user, put the queries for each facet into a list
and use...

List<Query> facetToApply = ...
DocList result = searcher.getDocList(mainQuery, facetsToApply,
                                     yourSort, 0, 20, searcher.GET_SCORES)

..and the filterCache will be used for each facet, the cache DocSets will
all be intersected, and the resulting DocSet will be converted to a Filter
that will be applied when your mainQuery is executed.


-Hoss

Re: request handler and caches

Posted by Chris Hostetter <ho...@fucit.org>.

I was so preoccupied with trying to understand why your cache wasn't
working, that i didnt' even register what you said about how you are using
it...

: My cache is really just a static cache of BitSet's for a fixed set of
: fields and their values.  With my current index size, creating the
: cache is incredibly fast (a second or so), but the index will grow
: much larger.

	...

: For a fixed set of fields (currently 4 or so of them) I'm building a
: HashMap keyed by field name, with the values of each key also a
: HashMap, keyed by term value.  The value of the inner HashMap is a
: BitSet representing all documents that have that value for that
: field.  These BitSets are used for a faceted browser and ANDed
: together based on user criteria, as well as combined with full-text
: queries using QueryFilter's BitSet.  Nothing fancy, and perhaps
: something Solr already helps provide?

Solr definitely makes this easier.  All you really need to keep track of
(either in your user cache, or in hardcoded logic) is the Queries you want
to have faceting on (TermQueries grouped by field it sounds like).  if you
want to know how many docs any two facets have in common (or that
your user's query has in common with a facet) use...

    int count = searcher.numDocs(facetQ1, facetQ2);

...or if you just wnat to know the number of docs in a single facet use
searcher.getDocSet(q).size().  (there's also getDocSet(List<Query>) if you
have an arbitrary number of facets you want to intersect)

Just about all of the methods in SolrIndexSearcher will automatically
cache the that DocSet in the filterCache so that any time you do
anything involving those Queries no acctual search is done, and
the cache will be autowarmed whenever a newSearcher is opened.

If you size the filterCache big enough, and register a seed query in the
firstSearcher listener you'll never spend time waiting for any of the
facet DocSets to be calculated.



-Hoss

Re: request handler and caches

Posted by Yonik Seeley <ys...@gmail.com>.

On 5/11/06, Chris Hostetter <ho...@fucit.org> wrote:
> DocSet does have a "getBits()" method that
> can be used to either access the underlying BitSet of a DocSet, or create
> a new BitSet that represents the DocSet (if the underlying implimentation
> isn't already using a BitSet) ... from there you could do BitSet
> operations to your hearts content, and then construct a new DocSet ... but
> getBits is deprecated.

Yeah... that reminds me of why I deprecated it - I'd like to replace
BitSet with my faster implementation that Iv'e been twiddling with
slowly on my own time.  Maybe now is the right time before more people
start using Solr in production...  I'll refresh my memory and see what
else needs to be done.

-Yonik

Re: request handler and caches

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On May 11, 2006, at 11:47 AM, Yonik Seeley wrote:
>> Also, we allow for inverted facet selection as well, allowing a user
>> to select all documents that do not have a specified value.
>
> So for a certain facet like "platform:pc", you also allow for "- 
> platform:pc"?

Yup!  And it magically is lightening fast with the BitSet stuff I've  
implemented.  It is a handy feature in our domain (19th century  
literature).  "Show me all documents in 1870 that Dante Gabriel  
Rossetti did NOT create" - this is done completely with BitSet's when  
no full-text queries are used.  The key thing is that the facets  
return back value/counts for each of the non-zero facets (only the  
values for documents that match the constraints).

> If this is a common enough thing for faceted browsing, we should
> probably build in support for that in the Solr APIs somehow (w/o
> storing DocSets for both).

I'm not sure how common an inverted constraint is, but it certainly  
is key to my world :)

> Do you facet on all terms for a particular set of fields, or are the
> terms to be faceted on defined outside the system?  If the former,
> most of your system would fall into what I would think of as "simple"
> faceted browsing, that should be supported by default some day.  The
> latter isn't too big of a leap either... maybe with the terms defined
> in solrconfig.xml or something.

I'm afraid to let folks outside my group bang on it, but the non-Solr  
architecture (XML-RPC-based Lucene search server) is up and running  
here: http://www.nines.org/search/browse (be nice, and also note that  
it may very well go down as this is not a production-quality  
deployment).  The UI is a bit sluggish because of the fairly large  
(by HTML standards, not Lucene) number of facet values being  
rendered.  But you'll see that you can add any number of  
constraints.  Things get faster to render as the set is constrained.   
The pie charts and numbers are all dynamic based on the current  
constraints.  A constraint can be added in the negative sense by  
clicking the "-", or it can be toggled once added by clicking the "+"  
or "-" link.

The faceted fields are currently hard-coded - they require special  
indexing considerations (indexed, but not tokenized).  And the set of  
values in each field is fairly limited, but the agent (author,  
creator, artist, etc) is the most unconstrained one.  I'm looking  
forward to refactoring for DocSet's to leverage the LRU cache  
goodness for this case as our data grows.

	Erik

Re: request handler and caches

Posted by Chris Hostetter <ho...@fucit.org>.

: > How can I flip a DocSet or
: > achieve the same sort of thing?
:
: Currently not implemented... we either could implement it (flip on a

That's not entirely true ... DocSet does have a "getBits()" method that
can be used to either access the underlying BitSet of a DocSet, or create
a new BitSet that represents the DocSet (if the underlying implimentation
isn't already using a BitSet) ... from there you could do BitSet
operations to your hearts content, and then construct a new DocSet ... but
getBits is deprecated.

In the long run, adding all of the methods currently in the BitSet class
to the DocSet interface would be mighty nice.




-Hoss

Re: request handler and caches

Posted by Yonik Seeley <ys...@gmail.com>.

On 5/11/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> A couple of questions about DocSet's though, so that I'm confident
> I'll be able to get the same functionality...
>
> Along with a BitSet for each term in selected fields, I also store a
> "catchall" BitSet that is an OR'd BitSet of all term BitSets

An efficient union isn't implemented yet.  The current union() method
creates a new DocSet, and it isn't optimized for speed with
HashDocSets.

I think we'd want to either
 - create a mutatingUnion(DocSet other) to prevent repeated creation
of a new DocSet, or
 - create a union(Collection<DocSet>)
 - or create a addTo(BitSet target)

> How can I flip a DocSet or
> achieve the same sort of thing?

Currently not implemented... we either could implement it (flip on a
HashDocSet will be big though), or implement some stuff like
ChainedFilter (have a NotDocSet that wraps a DocSet).  If memory is a
concern, the latter sounds like the right way to implement that one.

> Also, we allow for inverted facet selection as well, allowing a user
> to select all documents that do not have a specified value.

So for a certain facet like "platform:pc", you also allow for "-platform:pc"?
If this is a common enough thing for faceted browsing, we should
probably build in support for that in the Solr APIs somehow (w/o
storing DocSets for both).

>  I
> currently accomplish this in my loop to build up an aggregate
> constraint BitSet by using its .andNot() method.  How can I
> accomplish this using DocSet's?

It's not there yet, but I'd be in favor of  andNot functionallity in DocSet.

> If I can achieve these capabilities without too much effort, then my
> DocSet refactoring will happen sooner rather than later :)

Looks like it might be a little later ;-)
It's great to see the requirements that others have though!

Do you facet on all terms for a particular set of fields, or are the
terms to be faceted on defined outside the system?  If the former,
most of your system would fall into what I would think of as "simple"
faceted browsing, that should be supported by default some day.  The
latter isn't too big of a leap either... maybe with the terms defined
in solrconfig.xml or something.

-Yonik

Re: request handler and caches

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

Thanks to Hoss and Yonik again(!) for their valuable assistance  
pointing me to better ways to do what I want with facets within  
Solr's infrastructure.  Very helpful.

At this point I need to pragmatically put the DocSet refactoring on  
hold to accomplish some other things, but I did get the SolrCache and  
firstSearcher event listener working using my BitSet's and will  
tackle the DocSet migration in the near future.

A couple of questions about DocSet's though, so that I'm confident  
I'll be able to get the same functionality...

Along with a BitSet for each term in selected fields, I also store a  
"catchall" BitSet that is an OR'd BitSet of all term BitSets and then  
flipped (using BitSet.or() and .flip()).  How can I flip a DocSet or  
achieve the same sort of thing?  This catchall BitSet is used to show  
"<unspecified>" on the user interface for that field, to allow  
someone to select all documents that do not have any terms in that  
field.

Also, we allow for inverted facet selection as well, allowing a user  
to select all documents that do not have a specified value.  I  
currently accomplish this in my loop to build up an aggregate  
constraint BitSet by using its .andNot() method.  How can I  
accomplish this using DocSet's?

If I can achieve these capabilities without too much effort, then my  
DocSet refactoring will happen sooner rather than later :)

Again thanks for all the help and rapid response.  Most helpful, and  
also shows that Solr is alive, vibrant, and extremely capable.

	Erik

On May 10, 2006, at 5:23 PM, Yonik Seeley wrote:

> On 5/10/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>> For a fixed set of fields (currently 4 or so of them) I'm building a
>> HashMap keyed by field name, with the values of each key also a
>> HashMap, keyed by term value.  The value of the inner HashMap is a
>> BitSet representing all documents that have that value for that
>> field.  These BitSets are used for a faceted browser and ANDed
>> together based on user criteria, as well as combined with full-text
>> queries using QueryFilter's BitSet.  Nothing fancy, and perhaps
>> something Solr already helps provide?
>
> Using Solr's DocSet implementations will dramatically speed up your
> faceted browsing and reduce your memory footprint.  You could store
> these DocSets yourself (and turn off the filter cache so things aren't
> doubly stored), but here is how I might go about it:
>
> In your custom cache, just store the terms for the faceting fields
> (everything but the bitsets).
> field1 -> [term1, term2, term3, term4]
> field2 -> [terma, termb, termc, termd]
>
> Then when it comes time to get the count of items matching query x,
> do
>  count1 = searcher.numDocs(x,TermQuery(term1))
>  count2 = searcher.numDocs(x,TermQuery(term2))
>  ...
>
> Solr will check the filter cache for "x" and for the TermQuery facets,
> and generate them on the fly if they are not found.
>
> What you loose:
>  - teeny bit of performance because each facet gets looked up in a
> HashMap (I've profiled... this has been negligible for us)
>
> What you gain:
> - re-use of the filtercache (including the filter for the base
> query), much faster intersections with less average memory usage &
> less garbage produced
> - an ability to easily cap the number of filters used for the facets,
> allowing a gradual reduction in performance as cache hits lower,
> rather than an OOM.

Re: request handler and caches

Posted by Yonik Seeley <ys...@gmail.com>.

Something I forgot to mention are other easier alternatives to
associating user data with a particular searcher.

For example, MyData (in your case Map<FieldName,Terms>) could simply
be the single item in a solr user cache.  Another option is like how
Lucene caches FilteredQuery (basically a
WeakHashMap<SolrIndexSearcher,MyStuff>)

In either case, no regenerators or custom listeners are needed... just
configure a request to be sent to your plugin on both firstSearcher
and newSearcher events, and program your plugin to regenerate MyStuff
if it's not in the cache.

-Yonik

Re: request handler and caches

Posted by Chris Hostetter <ho...@fucit.org>.

: way to get the name at the time.  Many parts of Solr were done in an
: extreme rapid-apps type environment... I implemented it as fast as I
: could, no peer review, often past midnight, etc ;-)

Ah the good old days, when I'd send Yonik mail ~5PM Pacific requesting a
feature that i needed in my plugin, assuming he'd have replied with an
estimate of how long it would take by the time i got into work the
following morning (he's on the east coast)... only to get a suprise email
from him at midnight Pacific saying that he had a prototype ready if i
wanted to try it and was going to bed ... i'd play with it and give him
some feadback on the API and then when i'd show up at the office arround
10AM Pacific the next morning, he'd have been working on it for 3 hours
already and already be done with the damn thing.

: >  Also, it is confusing because there is a name() and getName()
: > methods required to implement a SolrCache.

It's been bugging me that so many of those "plugin" related classes don't
have in depth javadocs .. this thread has prompted me ot make a list of
hte 'biggees" on the TaskList ... it will give me something to next week
when i'm not allowed in my office building becuase they're moving
everyone.


-Hoss

Re: request handler and caches

Posted by Yonik Seeley <ys...@gmail.com>.

On 5/10/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> Odd :)  But, I've adjusted my code to account for this.  Why would
> you ever want different names than what is specified in the config
> file?

You probably wouldn't... as far as I remember, it was just the easiest
way to get the name at the time.  Many parts of Solr were done in an
extreme rapid-apps type environment... I implemented it as fast as I
could, no peer review, often past midnight, etc ;-)

The cache name can be nice for the cache to have when implementing
logging or toString() for instance.  Since I already had the name, I
used it as the key.  Something like CacheConfig (it represents the
entry in solrconfig.xml) could have a getCacheName(), removing the
need for the cache to keep track of it.

>  Also, it is confusing because there is a name() and getName()
> methods required to implement a SolrCache.

Definitely... I hadn't even noticed that before :-)

-Yonik

Re: request handler and caches

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On May 10, 2006, at 6:38 PM, Erik Hatcher wrote:
> On May 10, 2006, at 5:31 PM, Chris Hostetter wrote:
>
>>
>> : Great question... it should be, as it's created and registered  
>> in the
>> : SolrIndexSearcher constructor.  is the cache .name() returning the
>> : right thing?
>>
>> I was just about to ask that. it wasn't untill i started digging into
>> CacheCOnfig and SolrIndexSearcher because of this thread that i  
>> realized
>> it doesn't matter what "name" attribute you give your cache in the  
>> config,
>> the SolrCache implimentation itself is responsible for specifying  
>> the name
>> that can be used to access the searcher with  
>> SolrIndexSearcher.getCache()
>>
>> if you define your MyCache.name function to be...
>>
>>    public String name() { return "foo"; }
>>
>> then even if you have...
>>
>>      <cache name="facet_cache"
>>        class="org.foo.MyCache"
>>      />
>>
>> ...you'll access your cache using the name "foo".
>
> Ah, that was an issue in my code then.  I simply had all of the  
> "unnecessary" methods implemented returning a default value (null  
> for Object return values).  I now return the value of args.get 
> ("name") which is "facet_cache" in my case... but I'm still getting  
> the same NPE.

Uh, never mind.... my e-mail was written over the course of a few  
minutes as I was trying things, and I inadvertently returned the name  
from getName() instead of name().  All is now well.

	Erik

Re: request handler and caches

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On May 10, 2006, at 5:31 PM, Chris Hostetter wrote:

>
> : Great question... it should be, as it's created and registered in  
> the
> : SolrIndexSearcher constructor.  is the cache .name() returning the
> : right thing?
>
> I was just about to ask that. it wasn't untill i started digging into
> CacheCOnfig and SolrIndexSearcher because of this thread that i  
> realized
> it doesn't matter what "name" attribute you give your cache in the  
> config,
> the SolrCache implimentation itself is responsible for specifying  
> the name
> that can be used to access the searcher with  
> SolrIndexSearcher.getCache()
>
> if you define your MyCache.name function to be...
>
>    public String name() { return "foo"; }
>
> then even if you have...
>
>      <cache name="facet_cache"
>        class="org.foo.MyCache"
>      />
>
> ...you'll access your cache using the name "foo".

Ah, that was an issue in my code then.  I simply had all of the  
"unnecessary" methods implemented returning a default value (null for  
Object return values).  I now return the value of args.get("name")  
which is "facet_cache" in my case... but I'm still getting the same NPE.

> if you want to pay attention to the name specified i nteh config,  
> that's
> the responsability of your init method to get it from the Map or args.

Odd :)  But, I've adjusted my code to account for this.  Why would  
you ever want different names than what is specified in the config  
file?  Also, it is confusing because there is a name() and getName()  
methods required to implement a SolrCache.

	Erik

Re: request handler and caches

Posted by Chris Hostetter <ho...@fucit.org>.

: Great question... it should be, as it's created and registered in the
: SolrIndexSearcher constructor.  is the cache .name() returning the
: right thing?

I was just about to ask that. it wasn't untill i started digging into
CacheCOnfig and SolrIndexSearcher because of this thread that i realized
it doesn't matter what "name" attribute you give your cache in the config,
the SolrCache implimentation itself is responsible for specifying the name
that can be used to access the searcher with SolrIndexSearcher.getCache()

if you define your MyCache.name function to be...

   public String name() { return "foo"; }

then even if you have...

     <cache name="facet_cache"
       class="org.foo.MyCache"
     />

...you'll access your cache using the name "foo".

if you want to pay attention to the name specified i nteh config, that's
the responsability of your init method to get it from the Map or args.

(you didn't mention what your name() method looks like, but you did
include your init method, and i can see you aren't looking at the Map at
all)



-Hoss

Re: request handler and caches

Posted by Yonik Seeley <ys...@gmail.com>.

On 5/10/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> For a fixed set of fields (currently 4 or so of them) I'm building a
> HashMap keyed by field name, with the values of each key also a
> HashMap, keyed by term value.  The value of the inner HashMap is a
> BitSet representing all documents that have that value for that
> field.  These BitSets are used for a faceted browser and ANDed
> together based on user criteria, as well as combined with full-text
> queries using QueryFilter's BitSet.  Nothing fancy, and perhaps
> something Solr already helps provide?

Using Solr's DocSet implementations will dramatically speed up your
faceted browsing and reduce your memory footprint.  You could store
these DocSets yourself (and turn off the filter cache so things aren't
doubly stored), but here is how I might go about it:

In your custom cache, just store the terms for the faceting fields
(everything but the bitsets).
field1 -> [term1, term2, term3, term4]
field2 -> [terma, termb, termc, termd]

Then when it comes time to get the count of items matching query x,
do
  count1 = searcher.numDocs(x,TermQuery(term1))
  count2 = searcher.numDocs(x,TermQuery(term2))
  ...

Solr will check the filter cache for "x" and for the TermQuery facets,
and generate them on the fly if they are not found.

What you loose:
  - teeny bit of performance because each facet gets looked up in a
HashMap (I've profiled... this has been negligible for us)

What you gain:
 - re-use of the filtercache (including the filter for the base
query), much faster intersections with less average memory usage &
less garbage produced
 - an ability to easily cap the number of filters used for the facets,
allowing a gradual reduction in performance as cache hits lower,
rather than an OOM.

> The question still remains - why isn't my cache available from a
> firstSearcher .newSearcher() method?  The cache is created prior (as
> noted in the console output).

Great question... it should be, as it's created and registered in the
SolrIndexSearcher constructor.  is the cache .name() returning the
right thing?

-Yonik

Re: request handler and caches

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On May 10, 2006, at 4:11 PM, Yonik Seeley wrote:
> Unless you really need special cache behavior, can you use the
> LRUCache that comes with Solr?

Sure, I suppose I could use that, but it had more bells and whistles  
than I need.  I like to look at the interfaces and work from there  
and then use base classes that fit.  LRUCache didn't seem to fit what  
I wanted without contorting its configuration.

>   It will make your life easier using a
> well tested cache implementation.

My cache is really just a static cache of BitSet's for a fixed set of  
fields and their values.  With my current index size, creating the  
cache is incredibly fast (a second or so), but the index will grow  
much larger.

> You shouldn't really need to implement your own Listener either (while
> valid, it's a tougher approach).  You could just send your plugin a
> message via the builtin QuerytSenderListener.
>
>    <listener event="firstSearcher" class="solr.QuerySenderListener">
>      <arr name="queries">
>        <lst> <str name="q">fast_warm</str> <str name="start">0</ 
> str> <str name=
> "rows">10</str> </lst>
>      </arr>
>    </listener>

I see.... which is basically what I was already doing by having the  
cache lazy initialize on the first request in the request handler  
(aka plugin), except the first request is coming from startup thanks  
to the listener architecture.

> If you do choose to implement your own listener, you need to register
> it, like above.

I did, but omitted it from my previous details:

	<listener event="firstSearcher" class="org.nines.CacheFacetsListener"/>

> Details on how your facet cache is supposed to work might help with
> answering future questions.

For a fixed set of fields (currently 4 or so of them) I'm building a  
HashMap keyed by field name, with the values of each key also a  
HashMap, keyed by term value.  The value of the inner HashMap is a  
BitSet representing all documents that have that value for that  
field.  These BitSets are used for a faceted browser and ANDed  
together based on user criteria, as well as combined with full-text  
queries using QueryFilter's BitSet.  Nothing fancy, and perhaps  
something Solr already helps provide?

The question still remains - why isn't my cache available from a  
firstSearcher .newSearcher() method?  The cache is created prior (as  
noted in the console output).

	Erik



>
> -Yonik
>
> On 5/10/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>>
>> On May 10, 2006, at 3:01 PM, Yonik Seeley wrote:
>>
>> > On 5/10/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>> >> I've started down this route, but I'm not sure how to  
>> initialize my
>> >> cache the first time.
>> >>
>> >> I need access to the IndexReader to build the cache, and at this
>> >> point I don't need any incremental cache updates - if a new
>> >> IndexSearcher is swapped in, I want to rebuild the cache.
>> >>
>> >> Should I combine a custom SolrCache with a newSearcher listener to
>> >> have it generated right away?   I put in a dummy cache and
>> >> regenerator, but only see the cache .init() method being  
>> called, not
>> >> the warm() method (on application server startup).  How can I
>> >> bootstrap it such that my cache gets built on app. startup?
>> >
>> > If your cache is populated as the result of any request to your
>> > plugin, simply send a request via a firstSearcher hook.  If it's  
>> not
>> > populated for any request, then send a special request that your
>> > plugin would recognize as a "populate cache" request.
>>
>> Sorry I'm being dense today, though I really do appreciate the
>> incredibly fast response time you and Hoss have on this.  My cache is
>> not available in newSearcher() at startup time:
>>
>> public class CacheFacetsListener implements SolrEventListener {
>>    public void init(NamedList namedList) {
>>    }
>>
>>    public void postCommit() {
>>      throw new UnsupportedOperationException();
>>    }
>>
>>    public void newSearcher(SolrIndexSearcher newSearcher,
>> SolrIndexSearcher currentSearcher) {
>>      try {
>>        SolrCache cache = newSearcher.getCache("facet_cache");
>>        if (cache == null) {
>>          System.out.println("!!!!! cache is null");
>>        }
>>        cache.warm(newSearcher, null);
>>      } catch (IOException e) {
>>        log.severe(e.getMessage());
>>      }
>>    }
>> }
>>
>> I'm getting the "cache is null" message.  Though the cache is created
>> and init()'d as I see it's diagnostic output in Jetty's console
>> before the NPE:
>>
>> public class FacetCache implements SolrCache {
>>    private Map facetCache;  // key is field, and key to inner map is
>> value
>>    private State state;
>>
>>    private void loadFacets(IndexReader reader) throws IOException {
>>      System.out.println("Loading facets for " + reader.numDocs() + "
>> documents ...");
>>
>>      // ....
>>
>>      System.out.println("Done loading facets.");
>>    }
>>
>>
>>    public Object init(Map args, Object persistence, CacheRegenerator
>> regenerator) {
>>      state=State.CREATED;
>>      System.out.println("<<<<< FacetCache.init >>>>>");
>>      return persistence;
>>    }
>>
>>    // ......
>>
>> }
>>
>> And from solrconfig.xml:
>>
>>      <cache name="facet_cache"
>>        class="org.nines.FacetCache"
>>      />
>>
>> The console has this output:
>>
>> May 10, 2006 3:44:26 PM org.apache.solr.search.SolrIndexSearcher  
>> <init>
>> INFO: Opening Searcher@cfe790 main
>> <<<<< FacetCache.init >>>>>
>> May 10, 2006 3:44:26 PM org.apache.solr.core.SolrCore  
>> registerSearcher
>> INFO: Registered new searcher Searcher@cfe790 main
>> !!!!! cache is null
>> May 10, 2006 3:44:26 PM org.apache.solr.core.SolrException log
>> SEVERE: java.lang.NullPointerException
>>          at org.nines.CacheFacetsListener.newSearcher
>> (CacheFacetsListener.java:24)
>>          at org.apache.solr.core.SolrCore$2.call(SolrCore.java:427)
>>
>>
>> I'm probably making this more difficult than it needs to be, but
>> today I'm slow :)   What am I doing wrong?
>>
>> Thanks,
>>         Erik

Re: request handler and caches

Posted by Yonik Seeley <ys...@gmail.com>.

Unless you really need special cache behavior, can you use the
LRUCache that comes with Solr?  It will make your life easier using a
well tested cache implementation.  You can size it large enough so
that items never get dropped if that's the issue.

You shouldn't really need to implement your own Listener either (while
valid, it's a tougher approach).  You could just send your plugin a
message via the builtin QuerytSenderListener.

    <listener event="firstSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
        <lst> <str name="q">fast_warm</str> <str name="start">0</str> <str name=
"rows">10</str> </lst>
      </arr>
    </listener>

If you do choose to implement your own listener, you need to register
it, like above.
Details on how your facet cache is supposed to work might help with
answering future questions.

-Yonik

On 5/10/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>
> On May 10, 2006, at 3:01 PM, Yonik Seeley wrote:
>
> > On 5/10/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> >> I've started down this route, but I'm not sure how to initialize my
> >> cache the first time.
> >>
> >> I need access to the IndexReader to build the cache, and at this
> >> point I don't need any incremental cache updates - if a new
> >> IndexSearcher is swapped in, I want to rebuild the cache.
> >>
> >> Should I combine a custom SolrCache with a newSearcher listener to
> >> have it generated right away?   I put in a dummy cache and
> >> regenerator, but only see the cache .init() method being called, not
> >> the warm() method (on application server startup).  How can I
> >> bootstrap it such that my cache gets built on app. startup?
> >
> > If your cache is populated as the result of any request to your
> > plugin, simply send a request via a firstSearcher hook.  If it's not
> > populated for any request, then send a special request that your
> > plugin would recognize as a "populate cache" request.
>
> Sorry I'm being dense today, though I really do appreciate the
> incredibly fast response time you and Hoss have on this.  My cache is
> not available in newSearcher() at startup time:
>
> public class CacheFacetsListener implements SolrEventListener {
>    public void init(NamedList namedList) {
>    }
>
>    public void postCommit() {
>      throw new UnsupportedOperationException();
>    }
>
>    public void newSearcher(SolrIndexSearcher newSearcher,
> SolrIndexSearcher currentSearcher) {
>      try {
>        SolrCache cache = newSearcher.getCache("facet_cache");
>        if (cache == null) {
>          System.out.println("!!!!! cache is null");
>        }
>        cache.warm(newSearcher, null);
>      } catch (IOException e) {
>        log.severe(e.getMessage());
>      }
>    }
> }
>
> I'm getting the "cache is null" message.  Though the cache is created
> and init()'d as I see it's diagnostic output in Jetty's console
> before the NPE:
>
> public class FacetCache implements SolrCache {
>    private Map facetCache;  // key is field, and key to inner map is
> value
>    private State state;
>
>    private void loadFacets(IndexReader reader) throws IOException {
>      System.out.println("Loading facets for " + reader.numDocs() + "
> documents ...");
>
>      // ....
>
>      System.out.println("Done loading facets.");
>    }
>
>
>    public Object init(Map args, Object persistence, CacheRegenerator
> regenerator) {
>      state=State.CREATED;
>      System.out.println("<<<<< FacetCache.init >>>>>");
>      return persistence;
>    }
>
>    // ......
>
> }
>
> And from solrconfig.xml:
>
>      <cache name="facet_cache"
>        class="org.nines.FacetCache"
>      />
>
> The console has this output:
>
> May 10, 2006 3:44:26 PM org.apache.solr.search.SolrIndexSearcher <init>
> INFO: Opening Searcher@cfe790 main
> <<<<< FacetCache.init >>>>>
> May 10, 2006 3:44:26 PM org.apache.solr.core.SolrCore registerSearcher
> INFO: Registered new searcher Searcher@cfe790 main
> !!!!! cache is null
> May 10, 2006 3:44:26 PM org.apache.solr.core.SolrException log
> SEVERE: java.lang.NullPointerException
>          at org.nines.CacheFacetsListener.newSearcher
> (CacheFacetsListener.java:24)
>          at org.apache.solr.core.SolrCore$2.call(SolrCore.java:427)
>
>
> I'm probably making this more difficult than it needs to be, but
> today I'm slow :)   What am I doing wrong?
>
> Thanks,
>         Erik

Re: request handler and caches

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On May 10, 2006, at 3:01 PM, Yonik Seeley wrote:

> On 5/10/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>> I've started down this route, but I'm not sure how to initialize my
>> cache the first time.
>>
>> I need access to the IndexReader to build the cache, and at this
>> point I don't need any incremental cache updates - if a new
>> IndexSearcher is swapped in, I want to rebuild the cache.
>>
>> Should I combine a custom SolrCache with a newSearcher listener to
>> have it generated right away?   I put in a dummy cache and
>> regenerator, but only see the cache .init() method being called, not
>> the warm() method (on application server startup).  How can I
>> bootstrap it such that my cache gets built on app. startup?
>
> If your cache is populated as the result of any request to your
> plugin, simply send a request via a firstSearcher hook.  If it's not
> populated for any request, then send a special request that your
> plugin would recognize as a "populate cache" request.

Sorry I'm being dense today, though I really do appreciate the  
incredibly fast response time you and Hoss have on this.  My cache is  
not available in newSearcher() at startup time:

public class CacheFacetsListener implements SolrEventListener {
   public void init(NamedList namedList) {
   }

   public void postCommit() {
     throw new UnsupportedOperationException();
   }

   public void newSearcher(SolrIndexSearcher newSearcher,  
SolrIndexSearcher currentSearcher) {
     try {
       SolrCache cache = newSearcher.getCache("facet_cache");
       if (cache == null) {
         System.out.println("!!!!! cache is null");
       }
       cache.warm(newSearcher, null);
     } catch (IOException e) {
       log.severe(e.getMessage());
     }
   }
}

I'm getting the "cache is null" message.  Though the cache is created  
and init()'d as I see it's diagnostic output in Jetty's console  
before the NPE:

public class FacetCache implements SolrCache {
   private Map facetCache;  // key is field, and key to inner map is  
value
   private State state;

   private void loadFacets(IndexReader reader) throws IOException {
     System.out.println("Loading facets for " + reader.numDocs() + "  
documents ...");

     // ....

     System.out.println("Done loading facets.");
   }


   public Object init(Map args, Object persistence, CacheRegenerator  
regenerator) {
     state=State.CREATED;
     System.out.println("<<<<< FacetCache.init >>>>>");
     return persistence;
   }

   // ......

}

And from solrconfig.xml:

     <cache name="facet_cache"
       class="org.nines.FacetCache"
     />

The console has this output:

May 10, 2006 3:44:26 PM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening Searcher@cfe790 main
<<<<< FacetCache.init >>>>>
May 10, 2006 3:44:26 PM org.apache.solr.core.SolrCore registerSearcher
INFO: Registered new searcher Searcher@cfe790 main
!!!!! cache is null
May 10, 2006 3:44:26 PM org.apache.solr.core.SolrException log
SEVERE: java.lang.NullPointerException
         at org.nines.CacheFacetsListener.newSearcher 
(CacheFacetsListener.java:24)
         at org.apache.solr.core.SolrCore$2.call(SolrCore.java:427)


I'm probably making this more difficult than it needs to be, but  
today I'm slow :)   What am I doing wrong?

Thanks,
	Erik

Re: request handler and caches

Posted by Yonik Seeley <ys...@gmail.com>.

On 5/10/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> I've started down this route, but I'm not sure how to initialize my
> cache the first time.
>
> I need access to the IndexReader to build the cache, and at this
> point I don't need any incremental cache updates - if a new
> IndexSearcher is swapped in, I want to rebuild the cache.
>
> Should I combine a custom SolrCache with a newSearcher listener to
> have it generated right away?   I put in a dummy cache and
> regenerator, but only see the cache .init() method being called, not
> the warm() method (on application server startup).  How can I
> bootstrap it such that my cache gets built on app. startup?

If your cache is populated as the result of any request to your
plugin, simply send a request via a firstSearcher hook.  If it's not
populated for any request, then send a special request that your
plugin would recognize as a "populate cache" request.

-Yonik

Re: request handler and caches

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

I've started down this route, but I'm not sure how to initialize my  
cache the first time.

I need access to the IndexReader to build the cache, and at this  
point I don't need any incremental cache updates - if a new  
IndexSearcher is swapped in, I want to rebuild the cache.

Should I combine a custom SolrCache with a newSearcher listener to  
have it generated right away?   I put in a dummy cache and  
regenerator, but only see the cache .init() method being called, not  
the warm() method (on application server startup).  How can I  
bootstrap it such that my cache gets built on app. startup?

Thanks,
	Erik

On May 10, 2006, at 1:20 PM, Yonik Seeley wrote:

> Here's an example of the configuration from solrconfig.xml:
>
>    <!-- Example of a generic cache.  These caches may be accessed  
> by name
>         through SolrIndexSearcher.getCache(),cacheLookup(), and  
> cacheInsert().
>         The purpose is to enable easy caching of user/application  
> level data.
>         The regenerator argument should be specified as an  
> implementation
>         of solr.search.CacheRegenerator if autowarming is desired.   
> -->
>    <!--
>    <cache name="myUserCache"
>      class="solr.LRUCache"
>      size="4096"
>      initialSize="1024"
>      autowarmCount="1024"
>      regenerator="org.mycompany.mypackage.MyRegenerator"
>      />
>    -->
>
> -Yonik
>
> On 5/10/06, Yonik Seeley <ys...@gmail.com> wrote:
>> On 5/10/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>> > I build a "facet" cache in my request handler, but I need it to get
>> > refreshed when the index changes.  How can my custom request  
>> handler
>> > manage this cache and get notified when the index changes?
>>
>> The easiest way is to let Solr keep the cache (use a custom user  
>> cache
>> defined in the solrconfig.xml) and implement a Regenerator that is
>> called to create and refresh a new instance when the searcher is
>> changed.
>>
>> Does that suit your needs?
>>
>> -Yonik
>>

Re: request handler and caches

Posted by Yonik Seeley <ys...@gmail.com>.

Here's an example of the configuration from solrconfig.xml:

    <!-- Example of a generic cache.  These caches may be accessed by name
         through SolrIndexSearcher.getCache(),cacheLookup(), and cacheInsert().
         The purpose is to enable easy caching of user/application level data.
         The regenerator argument should be specified as an implementation
         of solr.search.CacheRegenerator if autowarming is desired.  -->
    <!--
    <cache name="myUserCache"
      class="solr.LRUCache"
      size="4096"
      initialSize="1024"
      autowarmCount="1024"
      regenerator="org.mycompany.mypackage.MyRegenerator"
      />
    -->

-Yonik

On 5/10/06, Yonik Seeley <ys...@gmail.com> wrote:
> On 5/10/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> > I build a "facet" cache in my request handler, but I need it to get
> > refreshed when the index changes.  How can my custom request handler
> > manage this cache and get notified when the index changes?
>
> The easiest way is to let Solr keep the cache (use a custom user cache
> defined in the solrconfig.xml) and implement a Regenerator that is
> called to create and refresh a new instance when the searcher is
> changed.
>
> Does that suit your needs?
>
> -Yonik
>

Re: request handler and caches

Posted by Chris Hostetter <ho...@fucit.org>.

: The easiest way is to let Solr keep the cache (use a custom user cache
: defined in the solrconfig.xml) and implement a Regenerator that is
: called to create and refresh a new instance when the searcher is
: changed.

If you have some reason why you can't or don't want to use a SolrCache,
the other trick you can use is to have a special query param on your
RequestHandler that tells it to do a bunch of work that will load the data
into your cache, and then configure firstSearcher and newSearcher events
to "ping" your handler with that param...

    <listener event="newSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
        <lst> <str name="qt">your_custom_hanler</str>
              <str name="forfce_cache_update">1</str>
        </lst>
      </arr>
    </listener>

I recommend using a SolrCache instead of this approach though, because a
Regenerators for SolrCaches have the benefit of knowinging state from the
old cache when they populate the new cache. ... but this appraoch works
really well for "pre-filling" a cache on server start up (firstSearcher)


-Hoss

Re: request handler and caches

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

It probably suits my needs perfectly!   My apologies for asking yet  
another question that can be answered by reading the configuration  
file :)

I'm refactoring now to using a SolrCache and a regenerator to see how  
it goes.

	Erik

On May 10, 2006, at 1:18 PM, Yonik Seeley wrote:

> On 5/10/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>> I build a "facet" cache in my request handler, but I need it to get
>> refreshed when the index changes.  How can my custom request handler
>> manage this cache and get notified when the index changes?
>
> The easiest way is to let Solr keep the cache (use a custom user cache
> defined in the solrconfig.xml) and implement a Regenerator that is
> called to create and refresh a new instance when the searcher is
> changed.
>
> Does that suit your needs?
>
> -Yonik

Re: request handler and caches

Posted by Yonik Seeley <ys...@gmail.com>.

On 5/10/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> I build a "facet" cache in my request handler, but I need it to get
> refreshed when the index changes.  How can my custom request handler
> manage this cache and get notified when the index changes?

The easiest way is to let Solr keep the cache (use a custom user cache
defined in the solrconfig.xml) and implement a Regenerator that is
called to create and refresh a new instance when the searcher is
changed.

Does that suit your needs?

-Yonik