You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Alexandre Rafalovitch (Jira)" <ji...@apache.org> on 2020/09/09 13:10:00 UTC
[jira] [Commented] (SOLR-14701) Deprecate Schemaless Mode (Discussion)

    [ https://issues.apache.org/jira/browse/SOLR-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192858#comment-17192858 ] 

Alexandre Rafalovitch commented on SOLR-14701:
----------------------------------------------

So, I am doing some quick review of actual code/configuration to better understand the issue. And a couple of questions come up:
 # Type widening. Like when the first pass thinks it is a plong and then it turns out to be a double. Do we actually need a multi-level type widening or just fall-back to default is sufficient if there is mismatch (plong or default, pdouble or default, pdate or default). The only example for multi-level is plong->pdouble->string if I am not mistaken. 
 # Single/Multi-valued. Currently we only support multiValue. If we are tracking, can the field definitions be single-valued (plong, pdouble) and then actual fields declare multiValued? What is the value of having double field type definition, I find it quite confusing and hard to track actual multiplicity.
 # Do we only look at new fields or do we compare with existing schema definitions and note the mismatch by name/mulltiplicity?
 # Do we have to split out Managed Schema JSON? Or can we just do structured output of name:type:multiplicity and let users compose the commands themselves? Or have the output format and add more formats later. Because maybe they want to hand-edit the schema or (as it will likely to be) the learning schema with this URP chain is not actually the production schema. Or because we are detecting mismatches in point above and the precise JSON can get messy. Either way, the JSON command generation needs to be in an independent tool I am guessing to be reusable. Which is nice, but is it in scope?

My own answers seem to be that the right approach seems to be:
 * Widen to default
 * Use single-value FieldTypes and only assign multiValued on fields themselves
 * Look at only new fields (for now), especially since the learning schema may not be matching production one
 * Output structured advice and leave room to do JSON generation for future Jira

Bonus questions:
 * Do the field type definitions actually need to exist in schema if we never create the real fields that use them (we check right now). Or does this just become an abstract parsing and mapping exercise?
 * Do we still need copyField instructions if we are not creating the fields ourselves?

Other notes to self:
 * We could probably track rough cardinality as well and/or field size for string (e.g. keep first 10 values/hashes and check length) and give recommendation based on that (for Enum, Faceting, Large field attribute, Lazy Loading in solrconfig.xml, etc). That's again based on focusing on advice, rather than pure schema generation.

 

 

> Deprecate Schemaless Mode (Discussion)
> --------------------------------------
>
>                 Key: SOLR-14701
>                 URL: https://issues.apache.org/jira/browse/SOLR-14701
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Marcus Eagan
>            Priority: Major
>         Attachments: image-2020-08-04-01-35-03-075.png
>
>
> I know this won't be the most popular ticket out there, but I am growing more and more sympathetic to the idea that we should rip many of the freedoms out that cause users more harm than not. One of the freedoms I saw time and time again to cause issues was schemaless mode. It doesn't work as named or documented, so I think it should be deprecated. 
> If you use it in production reliably and in a way that cannot be accomplished another way, I am happy to hear from more knowledgeable folks as to why deprecation is a bad idea. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org