You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Thiruvalluvan M. G. (Jira)" <ji...@apache.org> on 2023/05/22 03:58:00 UTC

[jira] [Comment Edited] (SOLR-16810) Under certain situations Solr produces managed schema XML that cannot be loaded

    [ https://issues.apache.org/jira/browse/SOLR-16810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17724755#comment-17724755 ] 

Thiruvalluvan M. G. edited comment on SOLR-16810 at 5/22/23 3:57 AM:
---------------------------------------------------------------------

The actual bug is in XML. If we escape certain characters while writing out XML, we must unescape while reading it. We should also avoid any ambiguity. My PR takes care of just that. Alternatively, we can stop escaping altogether. This will not be backward compatible.

We can error out on seeing control characters in field names, which will avoid this ugly situation for field names. But fixing the escape/unescape asymmetry is independent of this decision, because escaping happens not just for field names but for +all+ XML attribute values.


was (Author: thiru_mg):
The actual bug is in XML. If we escape certain characters while writing out XML, we must unescape while reading it. We should also avoid any ambiguity. My PR takes just care of it. Alternatively, we can stop escaping altogether. This will not be backward compatible.

We can error out on seeing control characters in field names, which will avoid this ugly situation for field names. But fixing the escape/unescape asymmetry is independent of this decision, because escaping happens not just for field names but for +all+ XML attribute values.

> Under certain situations Solr produces managed schema XML that cannot be loaded
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-16810
>                 URL: https://issues.apache.org/jira/browse/SOLR-16810
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Schema and Analysis
>    Affects Versions: 9.2.1
>            Reporter: Thiruvalluvan M. G.
>            Assignee: Ishan Chattopadhyaya
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> While persisting the {{ManagedIndexSchema}} as XML, non-printable characters in field names get escaped as {{{}#nn;{}}}, where {{nn}} is the decimal representation of the non-printable character. For example, if the field name has the byte {{{}0x14{}}}, it gets escaped as {{{}#20;{}}}. This in indistinguishable from the literal {{#20;}} in the field name. If we have two fields - one with the non-printable character and the other with the literal string, two fields get generated with the same name. Loading the resulting XML, naturally, causes an exception. To fix this, any occurrence of literal {{#}} in the field name should be escaped, with say {{{}##{}}}.
> A second problem is that while escaping happens when generating XML, the corresponding unescaping does not happen on loading it. This asymmetry should be fixed as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org