You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Morten Lied Johansen <mo...@ifi.uio.no> on 2011/11/30 14:25:15 UTC

Stats per group with StatsComponent?

Hi

I posted the below mail to the solr-user list a little over a week ago. 
Since there has been no response, we assume this means that what we need 
is not currently possible.

We need this functionality, and are willing to put in time and effort to 
implement it, but could use some pointers to where it would be natural 
to add this, and ideas for how to best solve it.

I'm also wondering if I should create an issue in JIRA right away, or if 
I should wait until we have a first patch ready?


-------- Original Message --------
Subject: Stats per group with StatsComponent?
Date: Tue, 22 Nov 2011 14:40:45 +0100
From: Morten Lied Johansen <mo...@ifi.uio.no>
Reply-To: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org


Hi

We need to get minimum and maximum values for a field, within a group in
a grouped search-result. Is this possible today, perhaps by using
StatsComponent some way?

I'll flesh out the example a little, to make the question clearer.

We have a number of documents, indexed with a price, date and a hotel.
For each hotel, there are a number of documents, each representing a
price/date combination. We then group our search result on hotel.

We want to show the minimum and maximum price for each hotel.

A little googling leads us to look at StatsComponent, as what it does
would be what we need, if it could be done for each group. There was a
thread on this list in August, "Grouping and performing statistics per
group" that seemed to go into this a bit, but didn't find a solution.

Is this possible in Solr 3.4, either with StatsComponent, or some other way?

-- 
Morten
We all live in a yellow subroutine.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Stats per group with StatsComponent?

Posted by Morten Lied Johansen <mo...@ifi.uio.no>.
On 03.12.2011 10:50, Martijn v Groningen wrote:
> Hi Morten,
>
> You can also take a look at:
> https://issues.apache.org/jira/browse/LUCENE-3444
>
> That is also a second pass collector. It collects all unique terms for
> a specified field for all top N groups.
> This is just the Lucene side. After it is committed it also needs be
> wired up in Solr.

Thanks. We have decided to go a slightly different route with our 
initial problem in order to make a deadline that is comming up fast, so 
the work on the stats is being delayed. We hope to return to this in a 
couple months, and I'll be sure to look at LUCENE-3444 at that point.

-- 
Morten Lied Johansen
Trees hit cars only in self-defence.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Stats per group with StatsComponent?

Posted by Martijn v Groningen <ma...@gmail.com>.
Hi Morten,

You can also take a look at:
https://issues.apache.org/jira/browse/LUCENE-3444

That is also a second pass collector. It collects all unique terms for
a specified field for all top N groups.
This is just the Lucene side. After it is committed it also needs be
wired up in Solr.

Martijn

On 2 December 2011 11:46, Martijn v Groningen
<ma...@gmail.com> wrote:
> Hi Morten,
>
>> As far as I understand, I need to create a subclass of
>> AbstractSecondPassGroupingCollector, and for each group maintain some sort
>> of structure to hold on to my values.
> This class is meant for collecting top N documents inside a group. The
> reason it is abstract is because it can get its group values from
> different source like indexed terms, function results and indexed
> docvalues.
> I think there should be a new collector type for computing min / max.
> This is also a second pass collector b/c it depends on the top
> SearchGroups collected by a concrete impl of
> AbstractFirstPassGroupingCollector.
>
>> As to getting the values, I think I understand how it works, but if anyone
>> could point me towards some documentation about how the AtomicReaderContext
>> works, and how to read specific fields, that would be great.
> Well there is the javadoc :) but what is important to remember that
> all the grouping collectors work per segment. It needs the
> AtomicReaderContext to get the values to do grouping for each segment.
>
>> My biggest question at the moment is how to get my values into the response?
>>
>> I was thinking I should create an new Grouping.Command that did this, but
>> then it seems I can't include the values directly with each group (the lists
>> in the "groups" element), but would need to add a separate structure with
>> the values for each group. Am I right in that assumption? How can I add more
>> values to the lists in the "groups" element? Which behavior would be
>> preferred?
>>
>>
>> I was hoping to end up with a response that looks sort of like the attached
>> XML.
> I also think the statistics section should included in each group like
> in your attached response example.
> If the Grouping.Command class gets a new method getStatsCollector()
> which returns zero or more collectors.
> Each of this collector is executed in the second search, then in the
> addDocList method the result of each collector can
> be put in the response.
>
> Martijn



-- 
Met vriendelijke groet,

Martijn van Groningen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Stats per group with StatsComponent?

Posted by Martijn v Groningen <ma...@gmail.com>.
Hi Morten,

> As far as I understand, I need to create a subclass of
> AbstractSecondPassGroupingCollector, and for each group maintain some sort
> of structure to hold on to my values.
This class is meant for collecting top N documents inside a group. The
reason it is abstract is because it can get its group values from
different source like indexed terms, function results and indexed
docvalues.
I think there should be a new collector type for computing min / max.
This is also a second pass collector b/c it depends on the top
SearchGroups collected by a concrete impl of
AbstractFirstPassGroupingCollector.

> As to getting the values, I think I understand how it works, but if anyone
> could point me towards some documentation about how the AtomicReaderContext
> works, and how to read specific fields, that would be great.
Well there is the javadoc :) but what is important to remember that
all the grouping collectors work per segment. It needs the
AtomicReaderContext to get the values to do grouping for each segment.

> My biggest question at the moment is how to get my values into the response?
>
> I was thinking I should create an new Grouping.Command that did this, but
> then it seems I can't include the values directly with each group (the lists
> in the "groups" element), but would need to add a separate structure with
> the values for each group. Am I right in that assumption? How can I add more
> values to the lists in the "groups" element? Which behavior would be
> preferred?
>
>
> I was hoping to end up with a response that looks sort of like the attached
> XML.
I also think the statistics section should included in each group like
in your attached response example.
If the Grouping.Command class gets a new method getStatsCollector()
which returns zero or more collectors.
Each of this collector is executed in the second search, then in the
addDocList method the result of each collector can
be put in the response.

Martijn

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Stats per group with StatsComponent?

Posted by Morten Lied Johansen <mo...@ifi.uio.no>.
On 30. nov. 2011 14:58, Martijn v Groningen wrote:
> You'll need to create a new second pass collector
> that computes the min / max for the top N groups. This collector then
> needs to
> be wired up in Solr. The AbstractSecondPassGroupingCollector is
> something you can take a look at. It collects the top documents for
> the top N groups.

I've spent some time looking at this code, and I could use a few more 
pointers to see if my assumptions are right, and get some idea of where 
I'm headed.

As far as I understand, I need to create a subclass of 
AbstractSecondPassGroupingCollector, and for each group maintain some 
sort of structure to hold on to my values.

As to getting the values, I think I understand how it works, but if 
anyone could point me towards some documentation about how the 
AtomicReaderContext works, and how to read specific fields, that would 
be great.

My biggest question at the moment is how to get my values into the response?

I was thinking I should create an new Grouping.Command that did this, 
but then it seems I can't include the values directly with each group 
(the lists in the "groups" element), but would need to add a separate 
structure with the values for each group. Am I right in that assumption? 
How can I add more values to the lists in the "groups" element? Which 
behavior would be preferred?


I was hoping to end up with a response that looks sort of like the 
attached XML.

-- 
Morten
We all live in a yellow subroutine.

Re: Stats per group with StatsComponent?

Posted by Martijn v Groningen <ma...@gmail.com>.
Looks fine!

Martijn

On 30 November 2011 15:25, Morten Lied Johansen <mo...@ifi.uio.no> wrote:
> On 30. nov. 2011 14:58, Martijn v Groningen wrote:
>>
>>
>
>> With the StatsComponent this isn't possible at the moment. The
>> StatsComponent will give you the min / max of field for the whole
>> query result.
>> If you want the min / max value per group you'll need to do some
>> coding. The grouping logic is executed inside Lucene collectors
>> located in the grouping module. You'll need to create a new second
>> pass collector that computes the min / max for the top N groups. This
>> collector then needs to be wired up in Solr. The
>> AbstractSecondPassGroupingCollector is something you can take a look
>> at. It collects the top documents for the top N groups.
>
>
> Thank you for your reply. We'll have a look at this and see if we can get
> something going this week.
>
>
>> You don't need to have a patch to open an issue. Just open an issue
>> with a good description and maybe some implementation details.
>
>
> I have created an issue, SOLR-2931. Let me know if I should add some more
> details to it. We will update it and follow any discussions as we work.
>
>
> --
> Morten
> We all live in a yellow subroutine.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>



-- 
Met vriendelijke groet,

Martijn van Groningen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Stats per group with StatsComponent?

Posted by Morten Lied Johansen <mo...@ifi.uio.no>.
On 30. nov. 2011 14:58, Martijn v Groningen wrote:
>

> With the StatsComponent this isn't possible at the moment. The
> StatsComponent will give you the min / max of field for the whole
> query result.
> If you want the min / max value per group you'll need to do some
> coding. The grouping logic is executed inside Lucene collectors
> located in the grouping module. You'll need to create a new second
> pass collector that computes the min / max for the top N groups. This
> collector then needs to be wired up in Solr. The
> AbstractSecondPassGroupingCollector is something you can take a look
> at. It collects the top documents for the top N groups.

Thank you for your reply. We'll have a look at this and see if we can 
get something going this week.

> You don't need to have a patch to open an issue. Just open an issue
> with a good description and maybe some implementation details.

I have created an issue, SOLR-2931. Let me know if I should add some 
more details to it. We will update it and follow any discussions as we work.

-- 
Morten
We all live in a yellow subroutine.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Stats per group with StatsComponent?

Posted by Martijn v Groningen <ma...@gmail.com>.
Hi Morten,

I missed your question on the user mailing list. Here is my answer:

With the StatsComponent this isn't possible at the moment. The
StatsComponent will give you the min / max of field for the whole
query result.
If you want the min / max value per group you'll need to do some
coding. The grouping logic is executed inside Lucene collectors
located in the
grouping module. You'll need to create a new second pass collector
that computes the min / max for the top N groups. This collector then
needs to
be wired up in Solr. The AbstractSecondPassGroupingCollector is
something you can take a look at. It collects the top documents for
the top N groups.

You don't need to have a patch to open an issue. Just open an issue
with a good description and maybe some implementation details.

Martijn

On 30 November 2011 14:25, Morten Lied Johansen <mo...@ifi.uio.no> wrote:
>
> Hi
>
> I posted the below mail to the solr-user list a little over a week ago.
> Since there has been no response, we assume this means that what we need is
> not currently possible.
>
> We need this functionality, and are willing to put in time and effort to
> implement it, but could use some pointers to where it would be natural to
> add this, and ideas for how to best solve it.
>
> I'm also wondering if I should create an issue in JIRA right away, or if I
> should wait until we have a first patch ready?
>
>
> -------- Original Message --------
> Subject: Stats per group with StatsComponent?
> Date: Tue, 22 Nov 2011 14:40:45 +0100
> From: Morten Lied Johansen <mo...@ifi.uio.no>
> Reply-To: solr-user@lucene.apache.org
> To: solr-user@lucene.apache.org
>
>
> Hi
>
> We need to get minimum and maximum values for a field, within a group in
> a grouped search-result. Is this possible today, perhaps by using
> StatsComponent some way?
>
> I'll flesh out the example a little, to make the question clearer.
>
> We have a number of documents, indexed with a price, date and a hotel.
> For each hotel, there are a number of documents, each representing a
> price/date combination. We then group our search result on hotel.
>
> We want to show the minimum and maximum price for each hotel.
>
> A little googling leads us to look at StatsComponent, as what it does
> would be what we need, if it could be done for each group. There was a
> thread on this list in August, "Grouping and performing statistics per
> group" that seemed to go into this a bit, but didn't find a solution.
>
> Is this possible in Solr 3.4, either with StatsComponent, or some other way?
>
> --
> Morten
> We all live in a yellow subroutine.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>



-- 
Met vriendelijke groet,

Martijn van Groningen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org