You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Gabriele Kahlout <ga...@mysimpatico.com> on 2011/07/03 19:18:36 UTC

How do I compute and store a field?

Hello,

I'm trying to add a field that counts the number of terms in a document to
my schema. So far I've been computing this value at query-time. Is there how
I could compute this once only and store the field?

final SolrIndexSearcher searcher = request.getSearcher();
        final SolrIndexReader reader = searcher.getReader();
        final String content = "content";

        final byte[] norms = reader.norms(content);
        final int[] docLengths;
        if (norms == null) {
            docLengths = null;
        } else {
            docLengths = new int[norms.length];
            int i = 0;
            for (byte b : norms) {

                float docNorm = searcher.getSimilarity().decodeNormValue(b);
                int docLength = 0;
                if (docNorm != 0) {
                    docLength = (int) (1 / docNorm); //reciprocal
                }
                docLengths[i++] = docLength;
            }
...
 final NumericField docLenNormField = new
NumericField(TestQueryResponseWriter.DOC_LENGHT);
 docLenNormField.setIntValue(docLengths[id]);
 doc.add(docLenNormField);

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: How do I compute and store a field?

Posted by Gabriele Kahlout <ga...@mysimpatico.com>.
Gee, I was about to post. I figured my issue is that of computing the unique
terms per document. One approach to compute that value is running the
analyzer on the document before before calling addDocument, and count the
number of tokens.
Then I can invoke addDocument with the value of the field computed.

The only issue is that I'm here making the assumption that if I use the same
Analyzer addDocument used in addDocument then that will always equal the
number of terms indexed for that document. Is that a right assumption? Any
alternative where I don't need to make this assumption?


On Tue, Jul 5, 2011 at 1:29 AM, Markus Jelsma <ma...@openindex.io>wrote:

> You can create a custom update processor. The passed AddUpdateCommand
> object
> has an accessor to the SolrInputDocument you're about to add. In the
> processAdd method you can add a new field with whatever you want.
>
> The wiki has a good example:
> http://wiki.apache.org/solr/UpdateRequestProcessor
>
>
> > Hello,
> >
> > I'm trying to add a field that counts the number of terms in a document
> to
> > my schema. So far I've been computing this value at query-time. Is there
> > how I could compute this once only and store the field?
> >
> > final SolrIndexSearcher searcher = request.getSearcher();
> >         final SolrIndexReader reader = searcher.getReader();
> >         final String content = "content";
> >
> >         final byte[] norms = reader.norms(content);
> >         final int[] docLengths;
> >         if (norms == null) {
> >             docLengths = null;
> >         } else {
> >             docLengths = new int[norms.length];
> >             int i = 0;
> >             for (byte b : norms) {
> >
> >                 float docNorm =
> > searcher.getSimilarity().decodeNormValue(b); int docLength = 0;
> >                 if (docNorm != 0) {
> >                     docLength = (int) (1 / docNorm); //reciprocal
> >                 }
> >                 docLengths[i++] = docLength;
> >             }
> > ...
> >  final NumericField docLenNormField = new
> > NumericField(TestQueryResponseWriter.DOC_LENGHT);
> >  docLenNormField.setIntValue(docLengths[id]);
> >  doc.add(docLenNormField);
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: How do I compute and store a field?

Posted by Markus Jelsma <ma...@openindex.io>.
You can create a custom update processor. The passed AddUpdateCommand object 
has an accessor to the SolrInputDocument you're about to add. In the 
processAdd method you can add a new field with whatever you want.

The wiki has a good example:
http://wiki.apache.org/solr/UpdateRequestProcessor


> Hello,
> 
> I'm trying to add a field that counts the number of terms in a document to
> my schema. So far I've been computing this value at query-time. Is there
> how I could compute this once only and store the field?
> 
> final SolrIndexSearcher searcher = request.getSearcher();
>         final SolrIndexReader reader = searcher.getReader();
>         final String content = "content";
> 
>         final byte[] norms = reader.norms(content);
>         final int[] docLengths;
>         if (norms == null) {
>             docLengths = null;
>         } else {
>             docLengths = new int[norms.length];
>             int i = 0;
>             for (byte b : norms) {
> 
>                 float docNorm =
> searcher.getSimilarity().decodeNormValue(b); int docLength = 0;
>                 if (docNorm != 0) {
>                     docLength = (int) (1 / docNorm); //reciprocal
>                 }
>                 docLengths[i++] = docLength;
>             }
> ...
>  final NumericField docLenNormField = new
> NumericField(TestQueryResponseWriter.DOC_LENGHT);
>  docLenNormField.setIntValue(docLengths[id]);
>  doc.add(docLenNormField);