You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Morten Lied Johansen <mo...@ifi.uio.no> on 2011/12/01 14:27:43 UTC

Re: Stats per group with StatsComponent?

On 30. nov. 2011 14:58, Martijn v Groningen wrote:
> You'll need to create a new second pass collector
> that computes the min / max for the top N groups. This collector then
> needs to
> be wired up in Solr. The AbstractSecondPassGroupingCollector is
> something you can take a look at. It collects the top documents for
> the top N groups.

I've spent some time looking at this code, and I could use a few more 
pointers to see if my assumptions are right, and get some idea of where 
I'm headed.

As far as I understand, I need to create a subclass of 
AbstractSecondPassGroupingCollector, and for each group maintain some 
sort of structure to hold on to my values.

As to getting the values, I think I understand how it works, but if 
anyone could point me towards some documentation about how the 
AtomicReaderContext works, and how to read specific fields, that would 
be great.

My biggest question at the moment is how to get my values into the response?

I was thinking I should create an new Grouping.Command that did this, 
but then it seems I can't include the values directly with each group 
(the lists in the "groups" element), but would need to add a separate 
structure with the values for each group. Am I right in that assumption? 
How can I add more values to the lists in the "groups" element? Which 
behavior would be preferred?

I was hoping to end up with a response that looks sort of like the 
attached XML.

-- 
Morten
We all live in a yellow subroutine.

Re: Stats per group with StatsComponent?

Posted by Morten Lied Johansen <mo...@ifi.uio.no>.

On 03.12.2011 10:50, Martijn v Groningen wrote:
> Hi Morten,
>
> You can also take a look at:
> https://issues.apache.org/jira/browse/LUCENE-3444
>
> That is also a second pass collector. It collects all unique terms for
> a specified field for all top N groups.
> This is just the Lucene side. After it is committed it also needs be
> wired up in Solr.

Thanks. We have decided to go a slightly different route with our 
initial problem in order to make a deadline that is comming up fast, so 
the work on the stats is being delayed. We hope to return to this in a 
couple months, and I'll be sure to look at LUCENE-3444 at that point.

-- 
Morten Lied Johansen
Trees hit cars only in self-defence.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Stats per group with StatsComponent?

Posted by Martijn v Groningen <ma...@gmail.com>.

Hi Morten,

You can also take a look at:
https://issues.apache.org/jira/browse/LUCENE-3444

That is also a second pass collector. It collects all unique terms for
a specified field for all top N groups.
This is just the Lucene side. After it is committed it also needs be
wired up in Solr.

Martijn

On 2 December 2011 11:46, Martijn v Groningen
<ma...@gmail.com> wrote:
> Hi Morten,
>
>> As far as I understand, I need to create a subclass of
>> AbstractSecondPassGroupingCollector, and for each group maintain some sort
>> of structure to hold on to my values.
> This class is meant for collecting top N documents inside a group. The
> reason it is abstract is because it can get its group values from
> different source like indexed terms, function results and indexed
> docvalues.
> I think there should be a new collector type for computing min / max.
> This is also a second pass collector b/c it depends on the top
> SearchGroups collected by a concrete impl of
> AbstractFirstPassGroupingCollector.
>
>> As to getting the values, I think I understand how it works, but if anyone
>> could point me towards some documentation about how the AtomicReaderContext
>> works, and how to read specific fields, that would be great.
> Well there is the javadoc :) but what is important to remember that
> all the grouping collectors work per segment. It needs the
> AtomicReaderContext to get the values to do grouping for each segment.
>
>> My biggest question at the moment is how to get my values into the response?
>>
>> I was thinking I should create an new Grouping.Command that did this, but
>> then it seems I can't include the values directly with each group (the lists
>> in the "groups" element), but would need to add a separate structure with
>> the values for each group. Am I right in that assumption? How can I add more
>> values to the lists in the "groups" element? Which behavior would be
>> preferred?
>>
>>
>> I was hoping to end up with a response that looks sort of like the attached
>> XML.
> I also think the statistics section should included in each group like
> in your attached response example.
> If the Grouping.Command class gets a new method getStatsCollector()
> which returns zero or more collectors.
> Each of this collector is executed in the second search, then in the
> addDocList method the result of each collector can
> be put in the response.
>
> Martijn



-- 
Met vriendelijke groet,

Martijn van Groningen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Stats per group with StatsComponent?

Posted by Martijn v Groningen <ma...@gmail.com>.

Hi Morten,

> As far as I understand, I need to create a subclass of
> AbstractSecondPassGroupingCollector, and for each group maintain some sort
> of structure to hold on to my values.
This class is meant for collecting top N documents inside a group. The
reason it is abstract is because it can get its group values from
different source like indexed terms, function results and indexed
docvalues.
I think there should be a new collector type for computing min / max.
This is also a second pass collector b/c it depends on the top
SearchGroups collected by a concrete impl of
AbstractFirstPassGroupingCollector.

> As to getting the values, I think I understand how it works, but if anyone
> could point me towards some documentation about how the AtomicReaderContext
> works, and how to read specific fields, that would be great.
Well there is the javadoc :) but what is important to remember that
all the grouping collectors work per segment. It needs the
AtomicReaderContext to get the values to do grouping for each segment.

> My biggest question at the moment is how to get my values into the response?
>
> I was thinking I should create an new Grouping.Command that did this, but
> then it seems I can't include the values directly with each group (the lists
> in the "groups" element), but would need to add a separate structure with
> the values for each group. Am I right in that assumption? How can I add more
> values to the lists in the "groups" element? Which behavior would be
> preferred?
>
>
> I was hoping to end up with a response that looks sort of like the attached
> XML.
I also think the statistics section should included in each group like
in your attached response example.
If the Grouping.Command class gets a new method getStatsCollector()
which returns zero or more collectors.
Each of this collector is executed in the second search, then in the
addDocList method the result of each collector can
be put in the response.

Martijn

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org