You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Stephen GRAY <st...@immi.gov.au> on 2013/10/18 07:57:21 UTC

Creating a SumFacetRequest class [SEC=UNOFFICIAL]

UNOFFICIAL
Hi everyone,

I need to get a sum of the values in an int field in all the documents in a facet. Because there is only a CountFacetRequest in Lucene I am trying to write a SumFacetRequest with associated Aggregator which does this. However the results I am getting when I use my SumFacetRequest are not correct.

Here is the aggregate method from the Aggregator I have written (based on CountingAggregator):

@Override
public void aggregate(int docID, float score, IntsRef ordinals) throws IOException {
  Document doc = searcher.doc(docID);
  int value = doc.getField(fieldName).numericValue().intValue();

  for (int i = 0; i < ordinals.length; i++) {
    sumArray[ordinals.ints[i]] += value;
  }
}

Would someone be able to tell me if this is correct? I have been assuming that ordinals.ints[i] returns an id for a facet that contains the document but maybe this is not correct.

Any help would be greatly appreciated.

Apologies if this is not the correct forum to post this.

Thanks,
Steve


UNOFFICIAL


--------------------------------------------------------------------
Important Notice: If you have received this email by mistake, please advise
the sender and delete the message and attachments immediately.  This email,
including attachments, may contain confidential, sensitive, legally privileged
and/or copyright information.  Any review, retransmission, dissemination
or other use of this information by persons or entities other than the
intended recipient is prohibited.  DIAC respects your privacy and has
obligations under the Privacy Act 1988.  The official departmental privacy
policy can be viewed on the department's website at www.immi.gov.au.  See:
http://www.immi.gov.au/functional/privacy.htm


---------------------------------------------------------------------


Re: Creating a SumFacetRequest class [SEC=UNOFFICIAL]

Posted by Shai Erera <se...@gmail.com>.
Hi Stephen,

The code seems correct in general (I have some comments below). The
ordinals that you get are those that are associated with that document
(docID). I assume this is not the newest Lucene though, right?

Can you boil this down to a simple testcase adding a couple of documents
with the value which you would like to aggregate and print the actual
values each facet gets?

About the code, I see that you read the value from a stored field. I
recommend that you store the value in a NumericDocValuesField as it's
loaded much faster and more efficiently than what you do. Your code
currently reads all stored fields for the document, which is both expensive
and inefficient.

Also, if you move up to the latest Lucene (4.5.0), the API is more
segment-oriented, so you're given all matching documents up front, and then
you can ask for their NumericDocValues once while you iterate over them.

These comments are related to efficiency though. As for your original
question, a simple testcase demonstrating the problem will help me spot the
issue.

Shai


On Fri, Oct 18, 2013 at 8:57 AM, Stephen GRAY <st...@immi.gov.au>wrote:

>  UNOFFICIAL
>
> Hi everyone,****
>
> ** **
>
> I need to get a sum of the values in an int field in all the documents in
> a facet. Because there is only a CountFacetRequest in Lucene I am trying to
> write a SumFacetRequest with associated Aggregator which does this. However
> the results I am getting when I use my SumFacetRequest are not correct.***
> *
>
> ** **
>
> Here is the aggregate method from the Aggregator I have written (based on
> CountingAggregator):****
>
> ** **
>
> @Override****
>
> *public* *void* aggregate(*int* docID, *float* score, IntsRef ordinals) *
> throws* IOException {****
>
>   Document doc = searcher.doc(docID);****
>
>   *int* value = doc.getField(fieldName).numericValue().intValue();****
>
>         ****
>
>   *for* (*int* i = 0; i < ordinals.length; i++) {****
>
>     sumArray[ordinals.ints[i]] += value;****
>
>   }****
>
> }****
>
> ** **
>
> Would someone be able to tell me if this is correct? I have been assuming
> that ordinals.ints[i] returns an id for a facet that contains the document
> but maybe this is not correct.****
>
> ** **
>
> Any help would be greatly appreciated.****
>
> ** **
>
> Apologies if this is not the correct forum to post this.****
>
> ** **
>
> Thanks,****
>
> Steve****
>
> ** **
>
> UNOFFICIAL
>
>
> --------------------------------------------------------------------
> Important Notice: If you have received this email by mistake, please advise
> the sender and delete the message and attachments immediately. This email,
> including attachments, may contain confidential, sensitive, legally
> privileged
> and/or copyright information. Any review, retransmission, dissemination
> or other use of this information by persons or entities other than the
> intended recipient is prohibited. DIAC respects your privacy and has
> obligations under the Privacy Act 1988. The official departmental privacy
> policy can be viewed on the department's website at www.immi.gov.au. See:
> http://www.immi.gov.au/functional/privacy.htm
>
>
> ---------------------------------------------------------------------
>