You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2007/02/02 20:50:31 UTC

convert custom facets to Solr facets...

Before Solr had facets, I built my own implementation in a much  
cruder and less performant way into Collex as custom request handlers.

Now the performance issue of warming up the cache needs to be  
addressed.  I'm going to upgrade Solr and adjust the application to  
work with the built-in faceting and see how far I get with that.  The  
dilemma is that I've got a couple of custom things that don't map to  
the built-in faceting and I'm looking for advice on how to proceed.

The index has a "type" field: "A" for archived objects and "C" for  
collectibles.  All the original objects are indexed in batch fashion  
as type "A".  Users collect objects and tags/annotates them.  When a  
user collects an object, a document of type "C" is indexed with the  
original objects unique identifier (a URI), the username, tags, and  
annotation.  My custom facet cache differs from the built-in facets  
in that it builds a cross-reference cache from the "C" types to the  
"A" types (a JOIN, heh).

We can do queries that return facet counts such as:

   - all collected objects
   - all objects collected by erikhatcher
   - all collected objects with tag "foo"

One of the facet counts returned is user, so you can easily see how  
many objects each user has collected.

For the basic faceting we do on object metadata, this will fit well  
with what Solr has built-in, but I'm not quite sure how to build in  
the cross-reference and leverage faster warming, so I'm asking here  
to see what thoughts folks have on how to proceed.

Thanks,
	Erik

Re: JOIN in Solr (was: convert custom facets to Solr facets...)

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

I'm quite open to NOT having a JOIN in Solr if flattening the model  
still provides the querying capability desired.  I've not fully  
followed the specifics that Yonik has mentioned on this thread, but  
it certainly is the case that denormalizing/flattening our domain  
does not exactly lend itself (easily) to querying exactly how we want.

I implemented the cross-reference caches in Collex knowing it wasn't  
very scalable and was implemented very crudely.  I think the warming  
of these cross-references can be made much smarter (it's braindead  
right now, and builds *everything* over again on a single commit of a  
single object being tagged) - but I've not yet grasped how to be more  
clever with the warming.  If I have the full picture of what changed  
document-wise between the current searcher and the warming one,  
reducing the effort the warmer takes shouldn't be too hard.  Can a  
warming routine know about what changed precisely?

In short, the JOIN is merely a means to an end.  If I can get to that  
end with Solr as-is, JOIN?  What JOIN?

	Erik

On Feb 3, 2007, at 11:40 PM, Ryan McKinley wrote:

> On 2/3/07, Walter Underwood <wu...@netflix.com> wrote:
>> We would never use JOIN. We denormalize for speed. Not a big deal.
>>
>
> I'm looking at an application where speed is not the only concern.  If
> I can remove the need for a 'normalized' and 'denormalized' form it
> would be a HUGE win.  Essentially I'd like solr to handle the JOIN
> rather then embed an SQL database and keep two databases synchronized.
> If this is at a slight performance hit, thats fine by me!
>
> When speed is the #1 concern, this may not be an option people should
> use.  Likewise, I don't think the fact this would break federated
> searching should deter the ambition to enable a JOIN like support for
> solr.

Re: JOIN in Solr (was: convert custom facets to Solr facets...)

Posted by Ryan McKinley <ry...@gmail.com>.

On 2/3/07, Walter Underwood <wu...@netflix.com> wrote:
> We would never use JOIN. We denormalize for speed. Not a big deal.
>

I'm looking at an application where speed is not the only concern.  If
I can remove the need for a 'normalized' and 'denormalized' form it
would be a HUGE win.  Essentially I'd like solr to handle the JOIN
rather then embed an SQL database and keep two databases synchronized.
 If this is at a slight performance hit, thats fine by me!

When speed is the #1 concern, this may not be an option people should
use.  Likewise, I don't think the fact this would break federated
searching should deter the ambition to enable a JOIN like support for
solr.

Re: JOIN in Solr (was: convert custom facets to Solr facets...)

Posted by Ryan McKinley <ry...@gmail.com>.

oops!!!  I meant to reply directly to Brian - an old friend of mine
from graduate school...

next time I'll check the reply-to button more closely.

Re: JOIN in Solr (was: convert custom facets to Solr facets...)

Posted by Ryan McKinley <ry...@gmail.com>.

>
> I share my dream with Ryan. My dream API looks like this, sticking
> with the artist/track metaphor, in which we have metadata, say the
> ...

you share my dream!  thats amazing!

I really hope eric persists with the JOIN direction...  (its would be
great to get the Lucene in Action guys working on this!)  I just sent
off for the ruby on rails books...  I've been fighting it for a while,
but eric is working on solr 'flare' which essentially looks *exactly*
like what I'm building for the BPL portal....

Re: JOIN in Solr (was: convert custom facets to Solr facets...)

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Feb 3, 2007, at 4:54 PM, Walter Underwood wrote:
> We would never use JOIN. We denormalize for speed. Not a big deal.

So out of curiosity then, what would your design be for indexing  
"objects" which have attributes that frequently change and the  
quicker you get that reflected to users the better?  Consider that  
the objects may be large amounts of full-text associated with them as  
well.

	Erik


>
> wunder
> ==
> Search Guru, Netflix
>
>
> On 2/3/07 11:16 AM, "Brian Whitman" <br...@variogr.am> wrote:
>
>> On Feb 2, 2007, at 4:46 PM, Ryan McKinley wrote:
>>
>>> I would LOVE to see a JOIN in SOLR.
>>>
>>> I have an index of artists, albums, and songs.  The artists have  
>>> lots
>>> of metadata and the songs very little.  I'd love to be able to  
>>> search
>>> for songs using the artist metadata.  Right now, I have to add  
>>> all the
>>> metadata for each artist and album to each song.
>>
>> I share my dream with Ryan. My dream API looks like this, sticking
>> with the artist/track metaphor, in which we have metadata, say the
>> artist's name, associated to artists that should immediately be
>> available in track searches:
>>
>> schema.xml: <aliasField name="artistName" joinOn="artistID">
>>
>> For track documents, we never <add> with a filled-in artistName. But
>> whenever we query: type:track artistName:aerosmith bpm:[120 TO 140] ,
>> the aliasField live searches the solr index for
>> artistName:aerosmith&fl=artistID, and limits the rest of search space
>> by those results.
>>
>> And for each result, if artistName is an aliasField, solr searches on
>> artistID:X&fl=artistName for each artistID X in the results.
>>
>> If the field denoted in aliasField is given (as it would be for
>> artistName in artist-type documents), the aliasField is disabled for
>> that document.
>>
>> Possibly insane, possibly breaking the intent of Solr, could get slow
>> quick, but also very useful.
>>
>> -Brian
>>
>>

Re: JOIN in Solr (was: convert custom facets to Solr facets...)

Posted by Walter Underwood <wu...@netflix.com>.

We would never use JOIN. We denormalize for speed. Not a big deal.

wunder
==
Search Guru, Netflix


On 2/3/07 11:16 AM, "Brian Whitman" <br...@variogr.am> wrote:

> On Feb 2, 2007, at 4:46 PM, Ryan McKinley wrote:
> 
>> I would LOVE to see a JOIN in SOLR.
>> 
>> I have an index of artists, albums, and songs.  The artists have lots
>> of metadata and the songs very little.  I'd love to be able to search
>> for songs using the artist metadata.  Right now, I have to add all the
>> metadata for each artist and album to each song.
> 
> I share my dream with Ryan. My dream API looks like this, sticking
> with the artist/track metaphor, in which we have metadata, say the
> artist's name, associated to artists that should immediately be
> available in track searches:
> 
> schema.xml: <aliasField name="artistName" joinOn="artistID">
> 
> For track documents, we never <add> with a filled-in artistName. But
> whenever we query: type:track artistName:aerosmith bpm:[120 TO 140] ,
> the aliasField live searches the solr index for
> artistName:aerosmith&fl=artistID, and limits the rest of search space
> by those results.
> 
> And for each result, if artistName is an aliasField, solr searches on
> artistID:X&fl=artistName for each artistID X in the results.
> 
> If the field denoted in aliasField is given (as it would be for
> artistName in artist-type documents), the aliasField is disabled for
> that document.
> 
> Possibly insane, possibly breaking the intent of Solr, could get slow
> quick, but also very useful.
> 
> -Brian
> 
>

JOIN in Solr (was: convert custom facets to Solr facets...)

Posted by Brian Whitman <br...@variogr.am>.

On Feb 2, 2007, at 4:46 PM, Ryan McKinley wrote:

> I would LOVE to see a JOIN in SOLR.
>
> I have an index of artists, albums, and songs.  The artists have lots
> of metadata and the songs very little.  I'd love to be able to search
> for songs using the artist metadata.  Right now, I have to add all the
> metadata for each artist and album to each song.

I share my dream with Ryan. My dream API looks like this, sticking  
with the artist/track metaphor, in which we have metadata, say the  
artist's name, associated to artists that should immediately be  
available in track searches:

schema.xml: <aliasField name="artistName" joinOn="artistID">

For track documents, we never <add> with a filled-in artistName. But  
whenever we query: type:track artistName:aerosmith bpm:[120 TO 140] ,  
the aliasField live searches the solr index for  
artistName:aerosmith&fl=artistID, and limits the rest of search space  
by those results.

And for each result, if artistName is an aliasField, solr searches on  
artistID:X&fl=artistName for each artistID X in the results.

If the field denoted in aliasField is given (as it would be for  
artistName in artist-type documents), the aliasField is disabled for  
that document.

Possibly insane, possibly breaking the intent of Solr, could get slow  
quick, but also very useful.

-Brian

Re: convert custom facets to Solr facets...

Posted by Ryan McKinley <ry...@gmail.com>.

>
> The index has a "type" field: "A" for archived objects and "C" for
> collectibles.  All the original objects are indexed in batch fashion
> as type "A".  Users collect objects and tags/annotates them.  When a
> user collects an object, a document of type "C" is indexed with the
> original objects unique identifier (a URI), the username, tags, and
> annotation.  My custom facet cache differs from the built-in facets
> in that it builds a cross-reference cache from the "C" types to the
> "A" types (a JOIN, heh).
>

I would LOVE to see a JOIN in SOLR.

I have an index of artists, albums, and songs.  The artists have lots
of metadata and the songs very little.  I'd love to be able to search
for songs using the artist metadata.  Right now, I have to add all the
metadata for each artist and album to each song.

this would be great!

ryan

Re: convert custom facets to Solr facets...

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Feb 13, 2007, at 6:07 PM, Yonik Seeley wrote:

> On 2/12/07, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>> And just for the record, Solr drives Collex @ NINES: <http://
>> www.nines.org/collex> which implements tagging along with faceted and
>> full-text search.  I've recently hacked our system such that the bulk
>> of our custom  caches are only refreshed when a new batch of data is
>> loaded, and only the "collectable cache" is updated on a <commit/>.
>> This reduced our new index searcher visibility time from 45 seconds
>> down to only a few seconds or less.
>
> Wow!

While this change in when caches get warmed is a big boost to the  
responsiveness to tagging, it's really not all that impressive  
considering the numbers... we have roughly only 500 user/object  
"documents".  It's cruising through those and building up a tag,  
user, and tag-user caches.

> It might be a while before we could come up with any kind of generic
> mechanism that could perform as well as your "hacks" w.r.t warming
> speed :-)

That's reassuring and scary at the same time!  :)

	Erik

Re: convert custom facets to Solr facets...

Posted by Yonik Seeley <yo...@apache.org>.

On 2/12/07, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> On Feb 12, 2007, at 9:10 PM, Gmail Account wrote:
> > This would be great!  I can't help with the solution but I am very
> > interested in using it if one of you guys can figure it out.
> >
> > I can't wait to see if this works out.
>
> And just for the record, Solr drives Collex @ NINES: <http://
> www.nines.org/collex> which implements tagging along with faceted and
> full-text search.  I've recently hacked our system such that the bulk
> of our custom  caches are only refreshed when a new batch of data is
> loaded, and only the "collectable cache" is updated on a <commit/>.
> This reduced our new index searcher visibility time from 45 seconds
> down to only a few seconds or less.

Wow!
This (the warming time) is a major benefit to having separate
documents that reference "main" documents that are updated less
frequently.
The hard part would be to generalize something like that.
A separate index for each document type sounds like it would help.

> As part of Flare, I will experiment with the tagging design Yonik has
> posted to the wiki but for now our "legacy" application is running
> fine with my early hacks.

My design so far was to show how we could get very far with what we
have now (or almost have now, with updateable documents).  However, it
doesn't take into account things like warming time.

It might be a while before we could come up with any kind of generic
mechanism that could perform as well as your "hacks" w.r.t warming
speed :-)

-Yonik

Re: convert custom facets to Solr facets...

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Feb 12, 2007, at 9:10 PM, Gmail Account wrote:
> This would be great!  I can't help with the solution but I am very  
> interested in using it if one of you guys can figure it out.
>
> I can't wait to see if this works out.

And just for the record, Solr drives Collex @ NINES: <http:// 
www.nines.org/collex> which implements tagging along with faceted and  
full-text search.  I've recently hacked our system such that the bulk  
of our custom  caches are only refreshed when a new batch of data is  
loaded, and only the "collectable cache" is updated on a <commit/>.   
This reduced our new index searcher visibility time from 45 seconds  
down to only a few seconds or less.

As part of Flare, I will experiment with the tagging design Yonik has  
posted to the wiki but for now our "legacy" application is running  
fine with my early hacks.

	Erik

Re: convert custom facets to Solr facets...

Posted by Gmail Account <ma...@gmail.com>.

This would be great!  I can't help with the solution but I am very 
interested in using it if one of you guys can figure it out.

I can't wait to see if this works out.

Mike

----- Original Message ----- 
From: "Erik Hatcher" <er...@ehatchersolutions.com>
To: <so...@lucene.apache.org>
Sent: Tuesday, February 06, 2007 4:51 AM
Subject: Re: convert custom facets to Solr facets...


> Yonik - this is great!   Thanks for codifying the use cases and  providing 
> a possible implementation.  I'll tinker with this more when  I can.
>
> Erik
>
>
> On Feb 4, 2007, at 2:13 PM, Yonik Seeley wrote:
>
>> I was confusing myself too much without nailing down more concrete 
>> examples,
>> so I took a shot at coming up with user tagging usecases and
>> a way to implement them with a flat schema.
>>
>> The usecases may be biased toward a flat schema since that's what I
>> had in mind... so feel free to add more, or change the usecase names
>> or descriptions to make more sense.
>>
>> http://wiki.apache.org/solr/UserTagDesign
>>
>> -Yonik
>

Re: convert custom facets to Solr facets...

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

Yonik - this is great!   Thanks for codifying the use cases and  
providing a possible implementation.  I'll tinker with this more when  
I can.

	Erik

On Feb 4, 2007, at 2:13 PM, Yonik Seeley wrote:

> I was confusing myself too much without nailing down more concrete  
> examples,
> so I took a shot at coming up with user tagging usecases and
> a way to implement them with a flat schema.
>
> The usecases may be biased toward a flat schema since that's what I
> had in mind... so feel free to add more, or change the usecase names
> or descriptions to make more sense.
>
> http://wiki.apache.org/solr/UserTagDesign
>
> -Yonik

Re: convert custom facets to Solr facets...

Posted by Yonik Seeley <yo...@apache.org>.

I was confusing myself too much without nailing down more concrete examples,
so I took a shot at coming up with user tagging usecases and
a way to implement them with a flat schema.

The usecases may be biased toward a flat schema since that's what I
had in mind... so feel free to add more, or change the usecase names
or descriptions to make more sense.

http://wiki.apache.org/solr/UserTagDesign

-Yonik

Re: convert custom facets to Solr facets...

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Feb 3, 2007, at 11:55 AM, Yonik Seeley wrote:
> On 2/3/07, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>> On Feb 2, 2007, at 4:29 PM, Yonik Seeley wrote:
>> > One downside of doing joins is that it makes it pretty hard to
>> > distribute/federate in the future because a document doesn't stand
>> > alone.
>>
>> The connection between objects is key in our library domain though.
>>
>> > A flat structure for tagging could be to add a
>> > taguser and tag field to the actual document each time a user
>> > tagged a document.
>>
>> I've been contemplating how that would look and work.  But the
>> downsides you mention are sorta show-stoppers for our needs:
>
> The main one being query or facet by all tags for a specific user?

Yeah, being able to query across all objects, or all objects a user  
has collected.  Both the all/mine modes are requirements in Collex  
(and already there, with cache reloading now being too slow).

> I assume an annotation is a comment (like a few sentences)?
> If you search on comments, do you just get the comments back with a
> pointer to the original doc, or do you get the original doc back?

Currently we don't have a feature to search annotations (yeah, they  
are just private user-specific comments).  If we searched on it we'd  
want both back, the original object and the annotation.

To load a specific object in Collex, I have a special request handler  
that pulls the original object by id, and also folds in the tags/ 
annotation for the username parameter specified in the request.

> Storing comments on a document:
> - could lead to increased relevancy... all comments from all users  
> would be
>   considered together for term-freq

Note, we do keep annotations private between users, but tags are public.

> - easy to get comments for a list of documents in a single query
> - can use lucene syntax across "A" fields like tite, and commentary.
>    +title:solr+comments:great
> - harder to search for comments from a specific user only
>   (need sloppy phrase or span queries to do this?)

Keeping things separate between users is important, as well as  
folding them together on tags.  Again, annotations are currently  
private in our system.

> Storing comments separately:
> - if you search in comments, you get the exact comment that  
> matched... if you
>   stored all comments on the A doc, you wouldn't know which matched
> (but highlighting
>   could help with that).

With annotations being private this won't be an issue.  Any search in  
annotations would be ANDed with the logged in username.  And there is  
only one annotation per collected object per user.

It has been discussed to allow the user to set whether an annotation  
is public or private.

> - easy to search comments only from a specific user
>
> Do comments need to be included in faceting in any way?

No, not at all.  Again, we've not done any annotation searching in  
Collex yet.  That is a very desirable feature though.

> ps: If I'm making less sense than usual, it might just be because it's
> the time of the year that kids bring home nasty germs, and I'm feeling
> rather fuzzy headed :-)

I know the feeling!

	Erik

p.s. If Solr can solve this situation of tagging objects in a  
generalizable way, we are really really rocking!   Consider Flickr's  
latest "machine tags": <http://www.flickr.com/groups/api/discuss/ 
72157594497877875/>

Re: convert custom facets to Solr facets...

Posted by Yonik Seeley <yo...@apache.org>.

On 2/3/07, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> On Feb 2, 2007, at 4:29 PM, Yonik Seeley wrote:
> > One downside of doing joins is that it makes it pretty hard to
> > distribute/federate in the future because a document doesn't stand
> > alone.
>
> The connection between objects is key in our library domain though.
>
> > A flat structure for tagging could be to add a
> > taguser and tag field to the actual document each time a user
> > tagged a document.
>
> I've been contemplating how that would look and work.  But the
> downsides you mention are sorta show-stoppers for our needs:

The main one being query or facet by all tags for a specific user?
That's doable I think.

I assume an annotation is a comment (like a few sentences)?
If you search on comments, do you just get the comments back with a
pointer to the original doc, or do you get the original doc back?

Storing comments on a document:
 - could lead to increased relevancy... all comments from all users would be
   considered together for term-freq
 - easy to get comments for a list of documents in a single query
 - can use lucene syntax across "A" fields like tite, and commentary.
    +title:solr+comments:great
 - harder to search for comments from a specific user only
   (need sloppy phrase or span queries to do this?)

Storing comments separately:
 - if you search in comments, you get the exact comment that matched... if you
   stored all comments on the A doc, you wouldn't know which matched
(but highlighting
   could help with that).
 - easy to search comments only from a specific user

Do comments need to be included in faceting in any way?

-Yonik

ps: If I'm making less sense than usual, it might just be because it's
the time of the year that kids bring home nasty germs, and I'm feeling
rather fuzzy headed :-)

Re: convert custom facets to Solr facets...

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Feb 2, 2007, at 4:29 PM, Yonik Seeley wrote:
> One downside of doing joins is that it makes it pretty hard to
> distribute/federate in the future because a document doesn't stand
> alone.

The connection between objects is key in our library domain though.

> A flat structure for tagging could be to add a
> taguser and tag field to the actual document each time a user  
> tagged a document.

I've been contemplating how that would look and work.  But the  
downsides you mention are sorta show-stoppers for our needs:

> - filter query resultst by a constraint tag=foo
> fq=tag:foo
>
> You wouldn't be able to query for:
> - total number of tags

Couldn't that be the term frequency information?

> - items with the largest number of tags

tag frequency is very important, but having a tag field would give us  
frequency per tag term.  So I don't see this as a problem.

> - a tag by a specific user... that would require something like a
> phrase match across fields.

This is necessary too.  The Collex sidebar allows you to see all  
objects tagged as "foo" by a specific user.

> Downsides of a flat structure:
> - you need to reindex the whole document, or have updateable documents
> - even with updateable documents, it could be costly to update
>  (if people's tagging rate is fairly low, this may not matter much)

I figured this usecase would lend itself well to updateable docs,  
though I've not yet visualized how this would work entirely.

>
> --- Separate tag or collectible objects ---
>>    - all collected objects
> The count of all tagged objects?  how would you do this?
>>    - all objects collected by erikhatcher
> facet.query=C_user:erikhatcher
>>    - all collected objects with tag "foo"
> facet.query=C_tag:foo

The "all objects tagged "foo" by erikhatcher is the holy grail, eh?

> - facet by tag
> facet.field=C_tag   (this would give counts of *tags* not documents)

These are important numbers too.  But object count per tag is the ideal.

> - filter query resultst by a constraint tag=foo
> Not currently doable, would need to build up a filter somehow...
> indirectFilter=id:((C_tag:foo).C_uid)
>
> If an indirect approach has enough advantages, we could perhaps come
> up with a way to express it.

I like it!

>> My custom facet cache differs from the built-in facets
>> in that it builds a cross-reference cache from the "C" types to the
>> "A" types (a JOIN, heh).
>
> What does the cross-reference cache look like when it's built?  A  
> simple int[]?
> To do more efficiently, it seems like one would want separate indicies
> for the A and C docs  to keep maxDoc() down.


     cache = new HashMap<String, Map>();
     Map<String,Map<String,DocSet>> userTagMap = new  
HashMap<String,Map<String,DocSet>>();
     Map<String,DocSet> tagMap = new HashMap<String, DocSet>();
     Map<String,DocSet> userMap = new HashMap<String, DocSet>();
     Map<String, DocSet> collectedMap = new HashMap<String, DocSet>();
     DocSet collectedSet = new BitDocSet();
     collectedMap.put("collected", collectedSet);
     cache.put("tag", tagMap);
     cache.put("usertag", userTagMap);
     cache.put("username", userMap);
     cache.put("collected", collectedMap);

so basically (in Ruby code) I have the following to get a DocSet:

	cache['tag'][tag]

or
	cache['usertag'][username][tag]

Interestingly, I do build a separate RAMDirectory index for another  
purpose under Collex: agent name lookup, where agents are associated  
with one or more roles.

> What's the id for the C docs?  user catenated with id of the collected
> doc, so all tags/comments for a particular user on a particular doc go
> in the same C doc?

Yes, a collectable object has a URI in this form: "#{object_id}/# 
{username}"

Thanks for the feedback thus far.  I'm optimistic we'll find a good  
solution to this.  Worst case, I continue to use my hack for mapping  
associations, but tune the cache generation a bit.

	Erik

Re: convert custom facets to Solr facets...

Posted by Yonik Seeley <yo...@apache.org>.

One downside of doing joins is that it makes it pretty hard to
distribute/federate in the future because a document doesn't stand
alone.

A flat structure for tagging could be to add a
taguser and tag field to the actual document each time a user tagged a document.

>    - all collected objects
facet.query=tag:*
>    - all objects collected by erikhatcher
facet.query=taguser:erikhatcher
>    - all collected objects with tag "foo"
facet.query=tag:foo
- facet by tag
facet.field=tag
- filter query resultst by a constraint tag=foo
fq=tag:foo

You wouldn't be able to query for:
 - total number of tags
 - items with the largest number of tags
 - a tag by a specific user... that would require something like a
phrase match across fields.

Downsides of a flat structure:
 - you need to reindex the whole document, or have updateable documents
 - even with updateable documents, it could be costly to update
  (if people's tagging rate is fairly low, this may not matter much)

--- Separate tag or collectible objects ---
>    - all collected objects
The count of all tagged objects?  how would you do this?
>    - all objects collected by erikhatcher
facet.query=C_user:erikhatcher
>    - all collected objects with tag "foo"
facet.query=C_tag:foo

- facet by tag
facet.field=C_tag   (this would give counts of *tags* not documents)

- filter query resultst by a constraint tag=foo
Not currently doable, would need to build up a filter somehow...
indirectFilter=id:((C_tag:foo).C_uid)

If an indirect approach has enough advantages, we could perhaps come
up with a way to express it.

> My custom facet cache differs from the built-in facets
> in that it builds a cross-reference cache from the "C" types to the
> "A" types (a JOIN, heh).

What does the cross-reference cache look like when it's built?  A simple int[]?
To do more efficiently, it seems like one would want separate indicies
for the A and C docs  to keep maxDoc() down.

What's the id for the C docs?  user catenated with id of the collected
doc, so all tags/comments for a particular user on a particular doc go
in the same C doc?

-Yonik


On 2/2/07, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> Before Solr had facets, I built my own implementation in a much
> cruder and less performant way into Collex as custom request handlers.
>
> Now the performance issue of warming up the cache needs to be
> addressed.  I'm going to upgrade Solr and adjust the application to
> work with the built-in faceting and see how far I get with that.  The
> dilemma is that I've got a couple of custom things that don't map to
> the built-in faceting and I'm looking for advice on how to proceed.
>
> The index has a "type" field: "A" for archived objects and "C" for
> collectibles.  All the original objects are indexed in batch fashion
> as type "A".  Users collect objects and tags/annotates them.  When a
> user collects an object, a document of type "C" is indexed with the
> original objects unique identifier (a URI), the username, tags, and
> annotation.  My custom facet cache differs from the built-in facets
> in that it builds a cross-reference cache from the "C" types to the
> "A" types (a JOIN, heh).
>
> We can do queries that return facet counts such as:
>
>    - all collected objects
>    - all objects collected by erikhatcher
>    - all collected objects with tag "foo"
>
> One of the facet counts returned is user, so you can easily see how
> many objects each user has collected.
>
> For the basic faceting we do on object metadata, this will fit well
> with what Solr has built-in, but I'm not quite sure how to build in
> the cross-reference and leverage faster warming, so I'm asking here
> to see what thoughts folks have on how to proceed.