You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Blackknight <Iv...@gmail.com> on 2018/01/31 09:36:56 UTC

Save the document size in to a new field

Hello guys,

I want to add an option to search document by size. For example, find the
top categories with the biggest documents. I thought about creating a new
update processor wich will counting the bytes of all fields in the document,
but I think it wont work good, because some fields are stored, some are
indexed, some od them has both of these flags, there are copyfields too wich
need to count...
 
So I think the size counter of fields in update processor, will lie about
the doc size. I don't take into account the compression of index on the
disk, but I want to get real numbers (I can admit for 10% observational
error)  
 
Someone knows what should I do?

I read some posts about saving size(in bytes) of document, all the posts
were relatively old, and has no solution. May be solr has new techniques for
document size counting? :)

Thank you, guys! 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Save the document size in to a new field

Posted by Emir Arnautović <em...@sematext.com>.
With any generic solution there will be always the question of what is the document size: should you count the same field twice if indexed in two different ways? Does size of index count or size of response?

If simplified version works for you - approximate doc size to the size of the largest field, e.g. ‘content’, you can use http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/FieldLengthUpdateProcessorFactory.html <http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/FieldLengthUpdateProcessorFactory.html> to obtain that size.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 31 Jan 2018, at 10:36, Blackknight <Iv...@gmail.com> wrote:
> 
> Hello guys,
> 
> I want to add an option to search document by size. For example, find the
> top categories with the biggest documents. I thought about creating a new
> update processor wich will counting the bytes of all fields in the document,
> but I think it wont work good, because some fields are stored, some are
> indexed, some od them has both of these flags, there are copyfields too wich
> need to count...
> 
> So I think the size counter of fields in update processor, will lie about
> the doc size. I don't take into account the compression of index on the
> disk, but I want to get real numbers (I can admit for 10% observational
> error)  
> 
> Someone knows what should I do?
> 
> I read some posts about saving size(in bytes) of document, all the posts
> were relatively old, and has no solution. May be solr has new techniques for
> document size counting? :)
> 
> Thank you, guys! 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html