You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Peter S <pe...@hotmail.com> on 2010/01/29 12:16:44 UTC

Aggregated facet value counts?

Hi,

 

I was wondering if anyone had come across this use case, and if this type of faceting is possible:

 

The requirement is to build a query such that an aggregated facet count of common (and'ed) field values form the basis of each returned facet count.

 

For example:

Let's say I have a number of documents in an index with, among others, the fields 'host' and 'user':

 

Doc1  host:machine_1   user:user_1

Doc2  host:machine_1   user:user_2

Doc3  host:machine_1   user:user_1

Doc3  host:machine_1   user:user_1

 

Doc4  host:machine_2   user:user_1

Doc5  host:machine_2   user:user_1

Doc6  host:machine_2   user:user_4

 

Doc7  host:machine_1   user:user_4

 

Is it possible to get facets back that would give the count of documents that have common host AND user values (preferably ordered - i.e. host then user for this example, so as not to create a factorial explosion)? Note that the caller wouldn't know what machine and user values exist, only the field names.

I've tried using facet queries in various ways to see if they could work for this, but I believe facet queries work on a different plane than this requirement (narrowing the term count, a.o.t. aggregating).

 

For the example above, the desired result would be:

 

machine_1/user_1 (3)

machine_1/user_2 (1)

machine_1/user_4 (1)

 

machine_2/user_1 (2)

machine_2/user_4 (1)

 

Has anyone had a need for this type of faceting and found a way to achieve it?

 

Many thanks,

Peter

 

 
 		 	   		  
_________________________________________________________________
We want to hear all your funny, exciting and crazy Hotmail stories. Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/

RE: Aggregated facet value counts?

Posted by Peter S <pe...@hotmail.com>.

Tree faceting - that sounds very interesting indeed. I'll have a look into that and see how it fits, as well as any improvements for adding facet queries, cross-field aggregation, date range etc. There could be some very nice use-cases for such functionality. Just wondering how this would work with distributed shards/multi-core...


Many Thanks! 

Peter

 

 
> From: erik.hatcher@gmail.com
> To: solr-user@lucene.apache.org
> Subject: Re: Aggregated facet value counts?
> Date: Fri, 29 Jan 2010 12:20:07 -0500
> 
> Sounds like what you're asking for is tree faceting. A basic 
> implementation is available in SOLR-792, but one that could also take 
> facet.queries, numeric or date range buckets, to tree on would be a 
> nice improvement.
> 
> Still, the underlying implementation will simply enumerate all the 
> possible values (SOLR-792 has some short-circuiting when the top-level 
> has zero, of course). A client-side application could do this with 
> multiple requests to Solr.
> 
> Subsearch - sure, just make more requests to Solr, rearranging the 
> parameters.
> 
> I'd still say that in general for this type of need that it'll 
> "generally" be less arbitrary and locking some things in during 
> indexing will be the pragmatic way to go for most cases.
> 
> Erik
> 
> 
> 
> On Jan 29, 2010, at 9:28 AM, Peter S wrote:
> 
> >
> > Well, it wouldn't be 'every' combination - more of 'any' combination 
> > at query-time.
> >
> > The 'arbitrary' part of the requirement is because it's not 
> > practical to predict every combination a user might ask for, 
> > although generally users would tend to search for similar/the same 
> > query combinations (but perhaps with different date ranges, for 
> > example).
> >
> > If 'predicted aggregate fields' were calculated at index-time on, 
> > say, 10 fields (the schema in question actually as 73 fields), 
> > that's 3,628,801 new fields. A large percentage of these would 
> > likely never be used (which ones would depend on the user, 
> > environment etc.).
> >
> >
> > Perhaps a more 'typical' use case than my network-based example 
> > would be a product search web page, where you want to show the 
> > number of products that are made by a manufacturer and within a 
> > certain price range (e.g. Sony [$600-$800] (15) ). To obtain the 
> > (15) facet count value, you would have to correlate the number of 
> > Sony products (say, (861)), and the products that fall into the [600 
> > TO 800] price range (say, (1226) ). The (15) would be the 
> > intersection of the Sony hits and the price range hits by 
> > 'manufacturer:Sony'. Am I right that filter queries could only do 
> > this for document hits if you know the field values ahead of time 
> > (e.g. fq=manufacturer:Sony&fq=price:[600 TO 800])? The facets could 
> > then be derived by simply counting the numFound for each result set.
> >
> >
> >
> > If there were subsearch support in Solr (i.e. take the output of a 
> > query and use it as input into another) that included facets 
> > [perhaps there is such support?], it might be used to achieve this 
> > effect.
> >
> >
> > A custom query parser plugin could work, maybe? I suppose it would 
> > need to gather up all the separate facets and correlate them 
> > according to the input query (e.g. host and user, or manufacturer 
> > and price range). Such a mechanism would be crying out for caching, 
> > but perhaps it could leverage the existing field and query caches.
> >
> >
> > Peter
> >
> >
> >
> >
> >> From: erik.hatcher@gmail.com
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Aggregated facet value counts?
> >> Date: Fri, 29 Jan 2010 07:39:44 -0500
> >>
> >> Creating values for every possible combination is what you're asking
> >> Solr to do at query-time, and as far as I know there isn't really a
> >> way to accomplish that like you're asking. Is the need really to be
> >> arbitrary here?
> >>
> >> Erik
> >>
> >> On Jan 29, 2010, at 7:25 AM, Peter S wrote:
> >>
> >>>
> >>> Hi Erik,
> >>>
> >>>
> >>>
> >>> Thanks for your reply. That's an interesting idea doing it at index-
> >>> time, and a good idea for known field combinations.
> >>>
> >>> The only thing is........
> >>>
> >>> How to handle arbitrary field combinations? - i.e. to allow the
> >>> caller to specify any combination of fields at query-time?
> >>>
> >>> So, yes, the data is available at index-time, but the combination
> >>> isn't (short of creating fields for every possible combination).
> >>>
> >>>
> >>>
> >>> Peter
> >>>
> >>>
> >>>
> >>>> From: erik.hatcher@gmail.com
> >>>> To: solr-user@lucene.apache.org
> >>>> Subject: Re: Aggregated facet value counts?
> >>>> Date: Fri, 29 Jan 2010 06:30:27 -0500
> >>>>
> >>>> When faced with this type of situation where the data is entirely
> >>>> available at index-time, simply create an aggregated field that 
> >>>> glues
> >>>> the two pieces together, and facet on that.
> >>>>
> >>>> Erik
> >>>>
> >>>> On Jan 29, 2010, at 6:16 AM, Peter S wrote:
> >>>>
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>>
> >>>>>
> >>>>> I was wondering if anyone had come across this use case, and if 
> >>>>> this
> >>>>> type of faceting is possible:
> >>>>>
> >>>>>
> >>>>>
> >>>>> The requirement is to build a query such that an aggregated facet
> >>>>> count of common (and'ed) field values form the basis of each
> >>>>> returned facet count.
> >>>>>
> >>>>>
> >>>>>
> >>>>> For example:
> >>>>>
> >>>>> Let's say I have a number of documents in an index with, among
> >>>>> others, the fields 'host' and 'user':
> >>>>>
> >>>>>
> >>>>>
> >>>>> Doc1 host:machine_1 user:user_1
> >>>>>
> >>>>> Doc2 host:machine_1 user:user_2
> >>>>>
> >>>>> Doc3 host:machine_1 user:user_1
> >>>>>
> >>>>> Doc3 host:machine_1 user:user_1
> >>>>>
> >>>>>
> >>>>>
> >>>>> Doc4 host:machine_2 user:user_1
> >>>>>
> >>>>> Doc5 host:machine_2 user:user_1
> >>>>>
> >>>>> Doc6 host:machine_2 user:user_4
> >>>>>
> >>>>>
> >>>>>
> >>>>> Doc7 host:machine_1 user:user_4
> >>>>>
> >>>>>
> >>>>>
> >>>>> Is it possible to get facets back that would give the count of
> >>>>> documents that have common host AND user values (preferably 
> >>>>> ordered
> >>>>> - i.e. host then user for this example, so as not to create a
> >>>>> factorial explosion)? Note that the caller wouldn't know what
> >>>>> machine and user values exist, only the field names.
> >>>>>
> >>>>> I've tried using facet queries in various ways to see if they 
> >>>>> could
> >>>>> work for this, but I believe facet queries work on a different 
> >>>>> plane
> >>>>> than this requirement (narrowing the term count, a.o.t.
> >>>>> aggregating).
> >>>>>
> >>>>>
> >>>>>
> >>>>> For the example above, the desired result would be:
> >>>>>
> >>>>>
> >>>>>
> >>>>> machine_1/user_1 (3)
> >>>>>
> >>>>> machine_1/user_2 (1)
> >>>>>
> >>>>> machine_1/user_4 (1)
> >>>>>
> >>>>>
> >>>>>
> >>>>> machine_2/user_1 (2)
> >>>>>
> >>>>> machine_2/user_4 (1)
> >>>>>
> >>>>>
> >>>>>
> >>>>> Has anyone had a need for this type of faceting and found a way to
> >>>>> achieve it?
> >>>>>
> >>>>>
> >>>>>
> >>>>> Many thanks,
> >>>>>
> >>>>> Peter
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> _________________________________________________________________
> >>>>> We want to hear all your funny, exciting and crazy Hotmail 
> >>>>> stories.
> >>>>> Tell us now
> >>>>> http://clk.atdmt.com/UKM/go/195013117/direct/01/
> >>>>
> >>>
> >>> _________________________________________________________________
> >>> Tell us your greatest, weirdest and funniest Hotmail stories
> >>> http://clk.atdmt.com/UKM/go/195013117/direct/01/
> >>
> >
> > 
> > _________________________________________________________________
> > We want to hear all your funny, exciting and crazy Hotmail stories. 
> > Tell us now
> > http://clk.atdmt.com/UKM/go/195013117/direct/01/
> 
 		 	   		  
_________________________________________________________________
Got a cool Hotmail story? Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/

Re: Aggregated facet value counts?

Posted by Erik Hatcher <er...@gmail.com>.

Sounds like what you're asking for is tree faceting.  A basic  
implementation is available in SOLR-792, but one that could also take  
facet.queries, numeric or date range buckets, to tree on would be a  
nice improvement.

Still, the underlying implementation will simply enumerate all the  
possible values (SOLR-792 has some short-circuiting when the top-level  
has zero, of course).  A client-side application could do this with  
multiple requests to Solr.

Subsearch - sure, just make more requests to Solr, rearranging the  
parameters.

I'd still say that in general for this type of need that it'll  
"generally" be less arbitrary and locking some things in during  
indexing will be the pragmatic way to go for most cases.

	Erik



On Jan 29, 2010, at 9:28 AM, Peter S wrote:

>
> Well, it wouldn't be 'every' combination - more of 'any' combination  
> at query-time.
>
> The 'arbitrary' part of the requirement is because it's not  
> practical to predict every combination a user might ask for,  
> although generally users would tend to search for similar/the same  
> query combinations (but perhaps with different date ranges, for  
> example).
>
> If 'predicted aggregate fields' were calculated at index-time on,  
> say, 10 fields (the schema in question actually as 73 fields),  
> that's 3,628,801 new fields. A large percentage of these would  
> likely never be used (which ones would depend on the user,  
> environment etc.).
>
>
> Perhaps a more 'typical' use case than my network-based example  
> would be a product search web page, where you want to show the  
> number of products that are made by a manufacturer and within a  
> certain price range (e.g. Sony [$600-$800] (15) ). To obtain the  
> (15) facet count value, you would have to correlate the number of  
> Sony products (say, (861)), and the products that fall into the [600  
> TO 800] price range (say, (1226) ). The (15) would be the  
> intersection of the Sony hits and the price range hits by  
> 'manufacturer:Sony'. Am I right that filter queries could only do  
> this for document hits if you know the field values ahead of time  
> (e.g. fq=manufacturer:Sony&fq=price:[600 TO 800])? The facets could  
> then be derived by simply counting the numFound for each result set.
>
>
>
> If there were subsearch support in Solr (i.e. take the output of a  
> query and use it as input into another) that included facets  
> [perhaps there is such support?], it might be used to achieve this  
> effect.
>
>
> A custom query parser plugin could work, maybe? I suppose it would  
> need to gather up all the separate facets and correlate them  
> according to the input query (e.g. host and user, or manufacturer  
> and price range). Such a mechanism would be crying out for caching,  
> but perhaps it could leverage the existing field and query caches.
>
>
> Peter
>
>
>
>
>> From: erik.hatcher@gmail.com
>> To: solr-user@lucene.apache.org
>> Subject: Re: Aggregated facet value counts?
>> Date: Fri, 29 Jan 2010 07:39:44 -0500
>>
>> Creating values for every possible combination is what you're asking
>> Solr to do at query-time, and as far as I know there isn't really a
>> way to accomplish that like you're asking. Is the need really to be
>> arbitrary here?
>>
>> Erik
>>
>> On Jan 29, 2010, at 7:25 AM, Peter S wrote:
>>
>>>
>>> Hi Erik,
>>>
>>>
>>>
>>> Thanks for your reply. That's an interesting idea doing it at index-
>>> time, and a good idea for known field combinations.
>>>
>>> The only thing is........
>>>
>>> How to handle arbitrary field combinations? - i.e. to allow the
>>> caller to specify any combination of fields at query-time?
>>>
>>> So, yes, the data is available at index-time, but the combination
>>> isn't (short of creating fields for every possible combination).
>>>
>>>
>>>
>>> Peter
>>>
>>>
>>>
>>>> From: erik.hatcher@gmail.com
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: Aggregated facet value counts?
>>>> Date: Fri, 29 Jan 2010 06:30:27 -0500
>>>>
>>>> When faced with this type of situation where the data is entirely
>>>> available at index-time, simply create an aggregated field that  
>>>> glues
>>>> the two pieces together, and facet on that.
>>>>
>>>> Erik
>>>>
>>>> On Jan 29, 2010, at 6:16 AM, Peter S wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> I was wondering if anyone had come across this use case, and if  
>>>>> this
>>>>> type of faceting is possible:
>>>>>
>>>>>
>>>>>
>>>>> The requirement is to build a query such that an aggregated facet
>>>>> count of common (and'ed) field values form the basis of each
>>>>> returned facet count.
>>>>>
>>>>>
>>>>>
>>>>> For example:
>>>>>
>>>>> Let's say I have a number of documents in an index with, among
>>>>> others, the fields 'host' and 'user':
>>>>>
>>>>>
>>>>>
>>>>> Doc1 host:machine_1 user:user_1
>>>>>
>>>>> Doc2 host:machine_1 user:user_2
>>>>>
>>>>> Doc3 host:machine_1 user:user_1
>>>>>
>>>>> Doc3 host:machine_1 user:user_1
>>>>>
>>>>>
>>>>>
>>>>> Doc4 host:machine_2 user:user_1
>>>>>
>>>>> Doc5 host:machine_2 user:user_1
>>>>>
>>>>> Doc6 host:machine_2 user:user_4
>>>>>
>>>>>
>>>>>
>>>>> Doc7 host:machine_1 user:user_4
>>>>>
>>>>>
>>>>>
>>>>> Is it possible to get facets back that would give the count of
>>>>> documents that have common host AND user values (preferably  
>>>>> ordered
>>>>> - i.e. host then user for this example, so as not to create a
>>>>> factorial explosion)? Note that the caller wouldn't know what
>>>>> machine and user values exist, only the field names.
>>>>>
>>>>> I've tried using facet queries in various ways to see if they  
>>>>> could
>>>>> work for this, but I believe facet queries work on a different  
>>>>> plane
>>>>> than this requirement (narrowing the term count, a.o.t.
>>>>> aggregating).
>>>>>
>>>>>
>>>>>
>>>>> For the example above, the desired result would be:
>>>>>
>>>>>
>>>>>
>>>>> machine_1/user_1 (3)
>>>>>
>>>>> machine_1/user_2 (1)
>>>>>
>>>>> machine_1/user_4 (1)
>>>>>
>>>>>
>>>>>
>>>>> machine_2/user_1 (2)
>>>>>
>>>>> machine_2/user_4 (1)
>>>>>
>>>>>
>>>>>
>>>>> Has anyone had a need for this type of faceting and found a way to
>>>>> achieve it?
>>>>>
>>>>>
>>>>>
>>>>> Many thanks,
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _________________________________________________________________
>>>>> We want to hear all your funny, exciting and crazy Hotmail  
>>>>> stories.
>>>>> Tell us now
>>>>> http://clk.atdmt.com/UKM/go/195013117/direct/01/
>>>>
>>>
>>> _________________________________________________________________
>>> Tell us your greatest, weirdest and funniest Hotmail stories
>>> http://clk.atdmt.com/UKM/go/195013117/direct/01/
>>
>
> 		 	   		
> _________________________________________________________________
> We want to hear all your funny, exciting and crazy Hotmail stories.  
> Tell us now
> http://clk.atdmt.com/UKM/go/195013117/direct/01/

RE: Aggregated facet value counts?

Posted by Peter S <pe...@hotmail.com>.

Well, it wouldn't be 'every' combination - more of 'any' combination at query-time.

The 'arbitrary' part of the requirement is because it's not practical to predict every combination a user might ask for, although generally users would tend to search for similar/the same query combinations (but perhaps with different date ranges, for example).

If 'predicted aggregate fields' were calculated at index-time on, say, 10 fields (the schema in question actually as 73 fields), that's 3,628,801 new fields. A large percentage of these would likely never be used (which ones would depend on the user, environment etc.).

Perhaps a more 'typical' use case than my network-based example would be a product search web page, where you want to show the number of products that are made by a manufacturer and within a certain price range (e.g. Sony [$600-$800] (15) ). To obtain the (15) facet count value, you would have to correlate the number of Sony products (say, (861)), and the products that fall into the [600 TO 800] price range (say, (1226) ). The (15) would be the intersection of the Sony hits and the price range hits by 'manufacturer:Sony'. Am I right that filter queries could only do this for document hits if you know the field values ahead of time (e.g. fq=manufacturer:Sony&fq=price:[600 TO 800])? The facets could then be derived by simply counting the numFound for each result set.

If there were subsearch support in Solr (i.e. take the output of a query and use it as input into another) that included facets [perhaps there is such support?], it might be used to achieve this effect.

A custom query parser plugin could work, maybe? I suppose it would need to gather up all the separate facets and correlate them according to the input query (e.g. host and user, or manufacturer and price range). Such a mechanism would be crying out for caching, but perhaps it could leverage the existing field and query caches.

Peter

> From: erik.hatcher@gmail.com
> To: solr-user@lucene.apache.org
> Subject: Re: Aggregated facet value counts?
> Date: Fri, 29 Jan 2010 07:39:44 -0500
> 
> Creating values for every possible combination is what you're asking 
> Solr to do at query-time, and as far as I know there isn't really a 
> way to accomplish that like you're asking. Is the need really to be 
> arbitrary here?
> 
> Erik
> 
> On Jan 29, 2010, at 7:25 AM, Peter S wrote:
> 
> >
> > Hi Erik,
> >
> >
> >
> > Thanks for your reply. That's an interesting idea doing it at index- 
> > time, and a good idea for known field combinations.
> >
> > The only thing is........
> >
> > How to handle arbitrary field combinations? - i.e. to allow the 
> > caller to specify any combination of fields at query-time?
> >
> > So, yes, the data is available at index-time, but the combination 
> > isn't (short of creating fields for every possible combination).
> >
> >
> >
> > Peter
> >
> >
> >
> >> From: erik.hatcher@gmail.com
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Aggregated facet value counts?
> >> Date: Fri, 29 Jan 2010 06:30:27 -0500
> >>
> >> When faced with this type of situation where the data is entirely
> >> available at index-time, simply create an aggregated field that glues
> >> the two pieces together, and facet on that.
> >>
> >> Erik
> >>
> >> On Jan 29, 2010, at 6:16 AM, Peter S wrote:
> >>
> >>>
> >>> Hi,
> >>>
> >>>
> >>>
> >>> I was wondering if anyone had come across this use case, and if this
> >>> type of faceting is possible:
> >>>
> >>>
> >>>
> >>> The requirement is to build a query such that an aggregated facet
> >>> count of common (and'ed) field values form the basis of each
> >>> returned facet count.
> >>>
> >>>
> >>>
> >>> For example:
> >>>
> >>> Let's say I have a number of documents in an index with, among
> >>> others, the fields 'host' and 'user':
> >>>
> >>>
> >>>
> >>> Doc1 host:machine_1 user:user_1
> >>>
> >>> Doc2 host:machine_1 user:user_2
> >>>
> >>> Doc3 host:machine_1 user:user_1
> >>>
> >>> Doc3 host:machine_1 user:user_1
> >>>
> >>>
> >>>
> >>> Doc4 host:machine_2 user:user_1
> >>>
> >>> Doc5 host:machine_2 user:user_1
> >>>
> >>> Doc6 host:machine_2 user:user_4
> >>>
> >>>
> >>>
> >>> Doc7 host:machine_1 user:user_4
> >>>
> >>>
> >>>
> >>> Is it possible to get facets back that would give the count of
> >>> documents that have common host AND user values (preferably ordered
> >>> - i.e. host then user for this example, so as not to create a
> >>> factorial explosion)? Note that the caller wouldn't know what
> >>> machine and user values exist, only the field names.
> >>>
> >>> I've tried using facet queries in various ways to see if they could
> >>> work for this, but I believe facet queries work on a different plane
> >>> than this requirement (narrowing the term count, a.o.t. 
> >>> aggregating).
> >>>
> >>>
> >>>
> >>> For the example above, the desired result would be:
> >>>
> >>>
> >>>
> >>> machine_1/user_1 (3)
> >>>
> >>> machine_1/user_2 (1)
> >>>
> >>> machine_1/user_4 (1)
> >>>
> >>>
> >>>
> >>> machine_2/user_1 (2)
> >>>
> >>> machine_2/user_4 (1)
> >>>
> >>>
> >>>
> >>> Has anyone had a need for this type of faceting and found a way to
> >>> achieve it?
> >>>
> >>>
> >>>
> >>> Many thanks,
> >>>
> >>> Peter
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _________________________________________________________________
> >>> We want to hear all your funny, exciting and crazy Hotmail stories.
> >>> Tell us now
> >>> http://clk.atdmt.com/UKM/go/195013117/direct/01/
> >>
> > 
> > _________________________________________________________________
> > Tell us your greatest, weirdest and funniest Hotmail stories
> > http://clk.atdmt.com/UKM/go/195013117/direct/01/
> 

_________________________________________________________________
We want to hear all your funny, exciting and crazy Hotmail stories. Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/

Re: Aggregated facet value counts?

Posted by Erik Hatcher <er...@gmail.com>.

Creating values for every possible combination is what you're asking  
Solr to do at query-time, and as far as I know there isn't really a  
way to accomplish that like you're asking.   Is the need really to be  
arbitrary here?

	Erik

On Jan 29, 2010, at 7:25 AM, Peter S wrote:

>
> Hi Erik,
>
>
>
> Thanks for your reply. That's an interesting idea doing it at index- 
> time, and a good idea for known field combinations.
>
> The only thing is........
>
> How to handle arbitrary field combinations? - i.e. to allow the  
> caller to specify any combination of fields at query-time?
>
> So, yes, the data is available at index-time, but the combination  
> isn't (short of creating fields for every possible combination).
>
>
>
> Peter
>
>
>
>> From: erik.hatcher@gmail.com
>> To: solr-user@lucene.apache.org
>> Subject: Re: Aggregated facet value counts?
>> Date: Fri, 29 Jan 2010 06:30:27 -0500
>>
>> When faced with this type of situation where the data is entirely
>> available at index-time, simply create an aggregated field that glues
>> the two pieces together, and facet on that.
>>
>> Erik
>>
>> On Jan 29, 2010, at 6:16 AM, Peter S wrote:
>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I was wondering if anyone had come across this use case, and if this
>>> type of faceting is possible:
>>>
>>>
>>>
>>> The requirement is to build a query such that an aggregated facet
>>> count of common (and'ed) field values form the basis of each
>>> returned facet count.
>>>
>>>
>>>
>>> For example:
>>>
>>> Let's say I have a number of documents in an index with, among
>>> others, the fields 'host' and 'user':
>>>
>>>
>>>
>>> Doc1 host:machine_1 user:user_1
>>>
>>> Doc2 host:machine_1 user:user_2
>>>
>>> Doc3 host:machine_1 user:user_1
>>>
>>> Doc3 host:machine_1 user:user_1
>>>
>>>
>>>
>>> Doc4 host:machine_2 user:user_1
>>>
>>> Doc5 host:machine_2 user:user_1
>>>
>>> Doc6 host:machine_2 user:user_4
>>>
>>>
>>>
>>> Doc7 host:machine_1 user:user_4
>>>
>>>
>>>
>>> Is it possible to get facets back that would give the count of
>>> documents that have common host AND user values (preferably ordered
>>> - i.e. host then user for this example, so as not to create a
>>> factorial explosion)? Note that the caller wouldn't know what
>>> machine and user values exist, only the field names.
>>>
>>> I've tried using facet queries in various ways to see if they could
>>> work for this, but I believe facet queries work on a different plane
>>> than this requirement (narrowing the term count, a.o.t.  
>>> aggregating).
>>>
>>>
>>>
>>> For the example above, the desired result would be:
>>>
>>>
>>>
>>> machine_1/user_1 (3)
>>>
>>> machine_1/user_2 (1)
>>>
>>> machine_1/user_4 (1)
>>>
>>>
>>>
>>> machine_2/user_1 (2)
>>>
>>> machine_2/user_4 (1)
>>>
>>>
>>>
>>> Has anyone had a need for this type of faceting and found a way to
>>> achieve it?
>>>
>>>
>>>
>>> Many thanks,
>>>
>>> Peter
>>>
>>>
>>>
>>>
>>>
>>> _________________________________________________________________
>>> We want to hear all your funny, exciting and crazy Hotmail stories.
>>> Tell us now
>>> http://clk.atdmt.com/UKM/go/195013117/direct/01/
>>
> 		 	   		
> _________________________________________________________________
> Tell us your greatest, weirdest and funniest Hotmail stories
> http://clk.atdmt.com/UKM/go/195013117/direct/01/

RE: Aggregated facet value counts?

Posted by Peter S <pe...@hotmail.com>.

Hi Erik,

 

Thanks for your reply. That's an interesting idea doing it at index-time, and a good idea for known field combinations.

The only thing is........

How to handle arbitrary field combinations? - i.e. to allow the caller to specify any combination of fields at query-time?

So, yes, the data is available at index-time, but the combination isn't (short of creating fields for every possible combination).

 

Peter


 
> From: erik.hatcher@gmail.com
> To: solr-user@lucene.apache.org
> Subject: Re: Aggregated facet value counts?
> Date: Fri, 29 Jan 2010 06:30:27 -0500
> 
> When faced with this type of situation where the data is entirely 
> available at index-time, simply create an aggregated field that glues 
> the two pieces together, and facet on that.
> 
> Erik
> 
> On Jan 29, 2010, at 6:16 AM, Peter S wrote:
> 
> >
> > Hi,
> >
> >
> >
> > I was wondering if anyone had come across this use case, and if this 
> > type of faceting is possible:
> >
> >
> >
> > The requirement is to build a query such that an aggregated facet 
> > count of common (and'ed) field values form the basis of each 
> > returned facet count.
> >
> >
> >
> > For example:
> >
> > Let's say I have a number of documents in an index with, among 
> > others, the fields 'host' and 'user':
> >
> >
> >
> > Doc1 host:machine_1 user:user_1
> >
> > Doc2 host:machine_1 user:user_2
> >
> > Doc3 host:machine_1 user:user_1
> >
> > Doc3 host:machine_1 user:user_1
> >
> >
> >
> > Doc4 host:machine_2 user:user_1
> >
> > Doc5 host:machine_2 user:user_1
> >
> > Doc6 host:machine_2 user:user_4
> >
> >
> >
> > Doc7 host:machine_1 user:user_4
> >
> >
> >
> > Is it possible to get facets back that would give the count of 
> > documents that have common host AND user values (preferably ordered 
> > - i.e. host then user for this example, so as not to create a 
> > factorial explosion)? Note that the caller wouldn't know what 
> > machine and user values exist, only the field names.
> >
> > I've tried using facet queries in various ways to see if they could 
> > work for this, but I believe facet queries work on a different plane 
> > than this requirement (narrowing the term count, a.o.t. aggregating).
> >
> >
> >
> > For the example above, the desired result would be:
> >
> >
> >
> > machine_1/user_1 (3)
> >
> > machine_1/user_2 (1)
> >
> > machine_1/user_4 (1)
> >
> >
> >
> > machine_2/user_1 (2)
> >
> > machine_2/user_4 (1)
> >
> >
> >
> > Has anyone had a need for this type of faceting and found a way to 
> > achieve it?
> >
> >
> >
> > Many thanks,
> >
> > Peter
> >
> >
> >
> >
> > 
> > _________________________________________________________________
> > We want to hear all your funny, exciting and crazy Hotmail stories. 
> > Tell us now
> > http://clk.atdmt.com/UKM/go/195013117/direct/01/
> 
 		 	   		  
_________________________________________________________________
Tell us your greatest, weirdest and funniest Hotmail stories
http://clk.atdmt.com/UKM/go/195013117/direct/01/

Re: Aggregated facet value counts?

Posted by Erik Hatcher <er...@gmail.com>.

When faced with this type of situation where the data is entirely  
available at index-time, simply create an aggregated field that glues  
the two pieces together, and facet on that.

	Erik

On Jan 29, 2010, at 6:16 AM, Peter S wrote:

>
> Hi,
>
>
>
> I was wondering if anyone had come across this use case, and if this  
> type of faceting is possible:
>
>
>
> The requirement is to build a query such that an aggregated facet  
> count of common (and'ed) field values form the basis of each  
> returned facet count.
>
>
>
> For example:
>
> Let's say I have a number of documents in an index with, among  
> others, the fields 'host' and 'user':
>
>
>
> Doc1  host:machine_1   user:user_1
>
> Doc2  host:machine_1   user:user_2
>
> Doc3  host:machine_1   user:user_1
>
> Doc3  host:machine_1   user:user_1
>
>
>
> Doc4  host:machine_2   user:user_1
>
> Doc5  host:machine_2   user:user_1
>
> Doc6  host:machine_2   user:user_4
>
>
>
> Doc7  host:machine_1   user:user_4
>
>
>
> Is it possible to get facets back that would give the count of  
> documents that have common host AND user values (preferably ordered  
> - i.e. host then user for this example, so as not to create a  
> factorial explosion)? Note that the caller wouldn't know what  
> machine and user values exist, only the field names.
>
> I've tried using facet queries in various ways to see if they could  
> work for this, but I believe facet queries work on a different plane  
> than this requirement (narrowing the term count, a.o.t. aggregating).
>
>
>
> For the example above, the desired result would be:
>
>
>
> machine_1/user_1 (3)
>
> machine_1/user_2 (1)
>
> machine_1/user_4 (1)
>
>
>
> machine_2/user_1 (2)
>
> machine_2/user_4 (1)
>
>
>
> Has anyone had a need for this type of faceting and found a way to  
> achieve it?
>
>
>
> Many thanks,
>
> Peter
>
>
>
>
> 		 	   		
> _________________________________________________________________
> We want to hear all your funny, exciting and crazy Hotmail stories.  
> Tell us now
> http://clk.atdmt.com/UKM/go/195013117/direct/01/