You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Chris M. Hostetter (Jira)" <ji...@apache.org> on 2023/10/24 20:17:00 UTC

[jira] [Updated] (SOLR-17052) SchemaCodecFactory/IndexSchema/FieldType relationships are kludgy, buggy, and inefficient

     [ https://issues.apache.org/jira/browse/SOLR-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris M. Hostetter updated SOLR-17052:
--------------------------------------
    Summary: SchemaCodecFactory/IndexSchema/FieldType relationships are kludgy, buggy, and inefficient  (was: SchemaCodecFactory/IndexSchema/FieldType relationships are kludgy and should be inverted)

> SchemaCodecFactory/IndexSchema/FieldType relationships are kludgy, buggy, and inefficient
> -----------------------------------------------------------------------------------------
>
>                 Key: SOLR-17052
>                 URL: https://issues.apache.org/jira/browse/SOLR-17052
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Priority: Major
>
> While getting familiar with the {{SolreCore + CodecFactory + SchemaCodecFactory + FieldType}} related code relevant to SOLR-17045, SOLR-17046, & SOLR-17047 It occurred to me that there is a lot of ineffeciencies and kludginess to how {{FieldType}} based "codec overrides" are used (and validated) by {{SchemaCodecFactory}} (and {{{}SolrCore.initCodec{}}}) :
>  * {{SolrCore.initCodec}} needs to be aware of all the possible ways a {{FieldType}} instance might support codec overrides
>  ** ... so it can fail if any are specified unless the {{CodecFactory instanceOf SolrCoreAware}}
>  *** ... even though that still doesn't ensure the factory supports those field type overrides
>  ** This validation currently just looks at {{getPostingsFormatForField}} & {{getDocValuesFormatForField}}
>  *** ... it's ignorant about {{DenseVectorField}} 's assumptions about being able to override aspects of the {{KnnVectorsFormat}}
>  *** ... and AFAICT, what validation is don't doesn't help if the Schema API is used to add new field types (w/ {{postingsFormat}} or {{docValuesFormat}} overrides)
>  * in all of the the {{SchemaCodecFactory}} "per-field" methods ({{{}getPostingsFormatForField{}}}, {{{}getDocValuesFormatForField{}}}, & {{{}getKnnVectorsFormatForField{}}}) ...
>  ** ... every call to these methods resolves a {{SchemaField}} instance – even though only the (Solr) {{FieldType}} is needed
>  *** Asking the {{IndexSchema}} for the {{SchemaField}} of a fieldName has more overhead then just asking for the {{FieldType}}
>  *** None of the things these methods care about can be configured on a per-fieldName bassis anyway.
>  ** For {{PostingsFormat}} and {{{}DocValuesFormat{}}}, every call to these methods repeats the SPI lookup on the "format name" configured on the {{FieldType}} instance
>  ** For {{KnnVectorsFormat}} every call to this method constructs a new {{SolrDelegatingKnnVectorsFormat}} – even though the same instance could be re-used for every field of the same {{FieldType}} instance.
>  * In {{FieldType}} ...
>  ** ... there is no validation anywhere that the {{postingsFormat}} or {{docValuesFormat}} are valid
>  *** ... bogus values only cause a problem when the {{SchemaCodecFactory}} tries to resolve them (when indexing)
>  * In {{DenseVectorField}} ...
>  ** ... {{checkSchemaField}} validates (and logs warnings) based on the {{vectorEncoding}} and {{{}dimensions{}}}...
>  *** ... Even though these validations aren't "field" specific – they are "type" specific, and could be validated in {{DenseVectorField.init()}}
>  ** BUT! ... there is no validation anywhere that the {{knnAlgorithm}} is supported, or that the HNSW options make sense for it
>  *** These are only validated by the {{Codec.getKnnVectorsFormatForField(...)}} impl provided by {{SchemaCodecFactory}} ...
>  **** ... and they are redundenly validated on every call



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org