You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2017/03/15 07:24:41 UTC
[jira] [Updated] (SOLR-10286) Declare a field as "large", don't
keep value in the document cache
[ https://issues.apache.org/jira/browse/SOLR-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Smiley updated SOLR-10286:
--------------------------------
Attachment: SOLR-10286_large_fields.patch
Here's a patch. The whole test suite has passed twice now.
Nocommits:
* SchemaField.isLarge will default to true for the purposes of testing during development of this feature. This was extremely useful. It should of course be false.
* SolrIndexSearcher: I want to refactor/move all the Lucene Document related code and docValuesAsStored type code out of here into a new companion class named {{SolrDocumentFetcher}}. I didn't do that yet as I want the patch to show what's changed more clearly.
* BaseEditorialTransformer (related to query elevation): This made bad instanceof assumptions that I fixed, but I found the code to be too loosy goosey to my liking on toString'ing whatever it didn't understand (I hate that in general; leads to hard-to-find bugs). I don't think it could happen so I added an "assert false". Now that I see all tests pass, I'm inclined to make it fail hard.
Schema package:
* FieldProperties: converted the bit masks from hex to instead use Java 8's boolean literal. Much clearer!
* Question: [~yonik@apache.org] what is {{BINARY}} for? This isn't used anywhere and the line of code dates back to Solr's initial Apache contribution. For a moment I thought I could use it as the same as a BinaryField check but apparently not.
* FieldType.checkSchemaField only used to test for docValues compatibility and subclasses would override this to add a no-op. I think that design was poor as it's too all-encompassing, so I made it call a new checkSupportsDocValues() and had the applicable subclasses override _that_ instead.
* FieldType.checkSchemaField now checks for "large" compatibility -- multiValued, stored, not-a-number. BinaryField overrides to throw as well as that hasn't been implemented yet.
SolrIndexSearcher:
* I refactored the doc() handling to always use a custom StoredFieldVisitor, which I think makes it clearer. This may also make it easier to add a Status.STOP optimization for single-valued fields but I didn't get to that.
* When the Unified/Postings highlighters supply their custom StoredFieldVisitor and match an already cached document's large field, this code will avoid a double-string conversion, reducing heap memory pressure.
Tests:
* The test is pretty basic; good enough? It'd be nice to add a test to the Solr UnifiedHighlighter related stuff to randomly use this field. It's at least an opt-in feature so I'm not too worried... not to mention I ran this with a default large'ness to tease out bugs. I wonder if the default large'ness could/should be flipped randomly by Solr's test infrastructure?
Bugs found/fixed:
* In a couple places in Solr, there was an assumption that the Lucene {{IndexableField}} was actually an instance of {{Field}}. Two cases are seen as fixed in this patch:
** {{DocumentBuilder.addField}}. It appears in-place updates might not have worked in some cases involving lazy fields, depending on the usage pattern.
** {{BaseEditorialTransformer}} (query elevation).
* RealTimeGetComponent: RTG can internally grab a ref-counted realtime searcher, lookup a document, then dec-ref the searcher. If the searcher is subsequently closed, the lazy field can't get the value anymore. Theoretically this problem could happen with Solr's standard lazy fields too but a "large" field is better at provoking it. I fixed this by essentially copying the IndexableField. It'd be nice if Lucene {{Field}} had a copy-constructor of an IndexableField; I was forced to subclass to accomplish the same.
Although not a strict requirement, ideally SOLR-10273 (largest field last) is also done.
> Declare a field as "large", don't keep value in the document cache
> ------------------------------------------------------------------
>
> Key: SOLR-10286
> URL: https://issues.apache.org/jira/browse/SOLR-10286
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: David Smiley
> Assignee: David Smiley
> Attachments: SOLR-10286_large_fields.patch
>
>
> (part of umbrella issue SOLR-10117)
> This adds a field to be declared as "large" in the schema. In the {{SolrIndexSearcher.doc(...)}} handling, these fields are lazily fetched from Lucene. Unlike {{LazyDocument.LazyField}}, it's not cached after first-use unless the value is "small" < 512KB by default. "large" can only be used when its stored="true" and multiValued="false" and the field is otherwise compatible (basically not a numeric field) -- you'll get a helpful exception if it's unsupported. BinaryField is not yet supported at this time; it could be in the future.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org