You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alexandre Rafalovitch <ar...@gmail.com> on 2013/04/14 06:29:30 UTC

Is any way to return the number of indexed tokens in a field?

Hello,

We seem to have all sorts of functions around tokenized field content, but
I am looking for simple count/length that can be returned as a
pseudo-field. Does anyone know of one out of the box?

The specific situation is that I am indexing a field for specific regular
expressions that become tokens (in a copyField). Not every field has the
same number of those.

I now want to find the documents that have maximum number of tokens in that
field (for testing and review). But I can't figure out how.  Any help would
be appreciated.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

Re: Is any way to return the number of indexed tokens in a field?

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Alex,

It's not what do you need to count, pre-analyzed values or tokens as an
analysis result.
if former, I suggest you to look into something like
FieldLengthUpdateProcessorFactory, in case of later you need to override
Similarity.computeNorm(String, FieldInvertState) / encode/decodeNorm.



On Sun, Apr 14, 2013 at 8:29 AM, Alexandre Rafalovitch
<ar...@gmail.com>wrote:

> Hello,
>
> We seem to have all sorts of functions around tokenized field content, but
> I am looking for simple count/length that can be returned as a
> pseudo-field. Does anyone know of one out of the box?
>
> The specific situation is that I am indexing a field for specific regular
> expressions that become tokens (in a copyField). Not every field has the
> same number of those.
>
> I now want to find the documents that have maximum number of tokens in that
> field (for testing and review). But I can't figure out how.  Any help would
> be appreciated.
>
> Regards,
>    Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>