You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Eric Pugh <ep...@opensourceconnections.com> on 2010/08/26 18:06:18 UTC

SolrPerformanceFactors wiki page says contradictory things...

Under "Factors affecting memory usage" there is this text:

When processing an "add" command for a document, the standard XML update handler has two limitations:

	• All of the document's fields must simultaneously fit into memory. (Technically, it's actually the sum of min(<the actual field value's length>, maxFieldLength). As such, adjusting maxFieldLength may be of some help.)
		• (I'm assuming that fields are truncated to maxFieldLength before being added to the relevant document object. If that's not true, then maxFieldLength won't help here. --ChrisHarris)
	• Each individual <field>...</field> tag in the input XML must fit into memory, regardless of maxFieldLength.


Bullet 1 contradicts bullet 2, at least, the way I read it.  

Looking at the tokenizer that applies the maxFieldLength cutoff, it is working with a stream...  That implies that the first bullet is correct, and that the entire XML document doesn't need to fit into memory.  Unless what we are trying to say is that to parse the incoming XML document, the entire document must fit into memory?  After that, the tokenizer kicks in and only the min(<the actual field value's length>, maxFieldLength) applies to each field...?

Eric


-----------------------------------------------------
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server
Free/Busy: http://tinyurl.com/eric-cal









---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: SolrPerformanceFactors wiki page says contradictory things...

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Thu, Aug 26, 2010 at 12:06 PM, Eric Pugh
<ep...@opensourceconnections.com> wrote:
> Under "Factors affecting memory usage" there is this text:
>
> When processing an "add" command for a document, the standard XML update handler has two limitations:
>
>        • All of the document's fields must simultaneously fit into memory. (Technically, it's actually the sum of min(<the actual field value's length>, maxFieldLength). As such, adjusting maxFieldLength may be of some help.)
>                • (I'm assuming that fields are truncated to maxFieldLength before being added to the relevant document object. If that's not true, then maxFieldLength won't help here. --ChrisHarris)
>        • Each individual <field>...</field> tag in the input XML must fit into memory, regardless of maxFieldLength.
>
>
> Bullet 1 contradicts bullet 2, at least, the way I read it.
>
> Looking at the tokenizer that applies the maxFieldLength cutoff, it is working with a stream...  That implies that the first bullet is correct, and that the entire XML document doesn't need to fit into memory.  Unless what we are trying to say is that to parse the incoming XML document, the entire document must fit into memory?  After that, the tokenizer kicks in and only the min(<the actual field value's length>, maxFieldLength) applies to each field...?


I think your understanding is correct: maxFieldLength has little to do
with memory use per-se - it's the max number of tokens indexed for any
given field in a document.  Of course cutting down the maxFieldLength
will cut down on what lucene internally stores before flushing a
segment too... but I imagine that's going to be irrelevant to 99.9% of
our users.

Maybe this whole thing should be cut down to "All of the document's
fields must currently simultaneously fit into memory.", if it's even
worth mentioning it at all.  Can you clean this up Eric?

-Yonik
http://lucenerevolution.org   Lucene/Solr Conference, Boston Oct 7-8

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org