You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2017/03/15 07:24:41 UTC
[jira] [Updated] (SOLR-10286) Declare a field as "large", don't keep value in the document cache

     [ https://issues.apache.org/jira/browse/SOLR-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley updated SOLR-10286:
--------------------------------
    Attachment: SOLR-10286_large_fields.patch

Here's a patch.  The whole test suite has passed twice now.

Nocommits:
* SchemaField.isLarge will default to true for the purposes of testing during development of this feature.  This was extremely useful.  It should of course be false.
* SolrIndexSearcher: I want to refactor/move all the Lucene Document related code and docValuesAsStored type code out of here into a new companion class named {{SolrDocumentFetcher}}.  I didn't do that yet as I want the patch to show what's changed more clearly.
* BaseEditorialTransformer (related to query elevation): This made bad instanceof assumptions that I fixed, but I found the code to be too loosy goosey to my liking on toString'ing whatever it didn't understand (I hate that in general; leads to hard-to-find bugs).  I don't think it could happen so I added an "assert false".  Now that I see all tests pass, I'm inclined to make it fail hard.

Schema package:
* FieldProperties: converted the bit masks from hex to instead use Java 8's boolean literal.  Much clearer!
* Question: [~yonik@apache.org] what is {{BINARY}} for?  This isn't used anywhere and the line of code dates back to Solr's initial Apache contribution. For a moment I thought I could use it as the same as a BinaryField check but apparently not.
* FieldType.checkSchemaField only used to test for docValues compatibility and subclasses would override this to add a no-op.  I think that design was poor as it's too all-encompassing, so I made it call a new checkSupportsDocValues() and had the applicable subclasses override _that_ instead.  
* FieldType.checkSchemaField now checks for "large" compatibility -- multiValued, stored, not-a-number.  BinaryField overrides to throw as well as that hasn't been implemented yet.

SolrIndexSearcher:
* I refactored the doc() handling to always use a custom StoredFieldVisitor, which I think makes it clearer.  This may also make it easier to add a Status.STOP optimization for single-valued fields but I didn't get to that.
* When the Unified/Postings highlighters supply their custom StoredFieldVisitor and match an already cached document's large field, this code will avoid a double-string conversion, reducing heap memory pressure.

Tests:
* The test is pretty basic; good enough?  It'd be nice to add a test to the Solr UnifiedHighlighter related stuff to randomly use this field.  It's at least an opt-in feature so I'm not too worried... not to mention I ran this with a default large'ness to tease out bugs. I wonder if the default large'ness could/should be flipped randomly by Solr's test infrastructure?

Bugs found/fixed:
* In a couple places in Solr, there was an assumption that the Lucene {{IndexableField}} was actually an instance of {{Field}}.  Two cases are seen as fixed in this patch:
** {{DocumentBuilder.addField}}. It appears in-place updates might not have worked in some cases involving lazy fields, depending on the usage pattern.
** {{BaseEditorialTransformer}} (query elevation).
* RealTimeGetComponent: RTG can internally grab a ref-counted realtime searcher, lookup a document, then dec-ref the searcher.  If the searcher is subsequently closed, the lazy field can't get the value anymore.  Theoretically this problem could happen with Solr's standard lazy fields too but a "large" field is better at provoking it. I fixed this by essentially copying the IndexableField.  It'd be nice if Lucene {{Field}} had a copy-constructor of an IndexableField; I was forced to subclass to accomplish the same.

Although not a strict requirement, ideally SOLR-10273 (largest field last) is also done.

> Declare a field as "large", don't keep value in the document cache
> ------------------------------------------------------------------
>
>                 Key: SOLR-10286
>                 URL: https://issues.apache.org/jira/browse/SOLR-10286
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: SOLR-10286_large_fields.patch
>
>
> (part of umbrella issue SOLR-10117)
> This adds a field to be declared as "large" in the schema.  In the {{SolrIndexSearcher.doc(...)}} handling, these fields are lazily fetched from Lucene.  Unlike {{LazyDocument.LazyField}}, it's not cached after first-use unless the value is "small" < 512KB by default.  "large" can only be used when its stored="true" and multiValued="false" and the field is otherwise compatible (basically not a numeric field) -- you'll get a helpful exception if it's unsupported. BinaryField is not yet supported at this time; it could be in the future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org