You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Luca Cavanna (Created) (JIRA)" <ji...@apache.org> on 2012/03/06 15:16:58 UTC

[jira] [Created] (SOLR-3207) Add field name validation

Add field name validation
-------------------------

                 Key: SOLR-3207
                 URL: https://issues.apache.org/jira/browse/SOLR-3207
             Project: Solr
          Issue Type: Improvement
    Affects Versions: 4.0
            Reporter: Luca Cavanna
             Fix For: 4.0


Given the SOLR-2444 updated fl syntax and the SOLR-2719 regression, it would be useful to add some kind of validation regarding the field names you can use on Solr.
The objective would be adding consistency, allowing only field names that you can then use within fl, sorting etc.

The rules, taken from the actual StrParser behaviour, seem to be the following: 

- same used for java identifiers (Character#isJavaIdentifierPart), plus the use of trailing '.' and '-'

- for the first character the rule is Character#isJavaIdentifierStart minus '$' (The dash can't be used as first character (SOLR-3191) for example)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3207) Add field name validation

Posted by "Luca Cavanna (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230408#comment-13230408 ] 

Luca Cavanna commented on SOLR-3207:
------------------------------------

The first letter should be ok as checked in my patch. Regarding the trailing characters, do you mean we shouldn't use isJavaIdentifierPart anymore but something else? That's even more restrictive than my patch since I've used the existing rules applied while parsing the fl parameter (ReturnFields class). No problem for me, are we all sure we want to proceed this way?

I'll update my patch later on. Then I'd document this within the Schema wiki page. 

That's a big change, any opinion is welcome!
                
> Add field name validation
> -------------------------
>
>                 Key: SOLR-3207
>                 URL: https://issues.apache.org/jira/browse/SOLR-3207
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Luca Cavanna
>             Fix For: 4.0
>
>         Attachments: SOLR-3207.patch
>
>
> Given the SOLR-2444 updated fl syntax and the SOLR-2719 regression, it would be useful to add some kind of validation regarding the field names you can use on Solr.
> The objective would be adding consistency, allowing only field names that you can then use within fl, sorting etc.
> The rules, taken from the actual StrParser behaviour, seem to be the following: 
> - same used for java identifiers (Character#isJavaIdentifierPart), plus the use of trailing '.' and '-'
> - for the first character the rule is Character#isJavaIdentifierStart minus '$' (The dash can't be used as first character (SOLR-3191) for example)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3207) Add field name validation

Posted by "Luca Cavanna (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223272#comment-13223272 ] 

Luca Cavanna commented on SOLR-3207:
------------------------------------

Guys, let me know your thoughts about those rules, I probably missed something!
                
> Add field name validation
> -------------------------
>
>                 Key: SOLR-3207
>                 URL: https://issues.apache.org/jira/browse/SOLR-3207
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Luca Cavanna
>             Fix For: 4.0
>
>
> Given the SOLR-2444 updated fl syntax and the SOLR-2719 regression, it would be useful to add some kind of validation regarding the field names you can use on Solr.
> The objective would be adding consistency, allowing only field names that you can then use within fl, sorting etc.
> The rules, taken from the actual StrParser behaviour, seem to be the following: 
> - same used for java identifiers (Character#isJavaIdentifierPart), plus the use of trailing '.' and '-'
> - for the first character the rule is Character#isJavaIdentifierStart minus '$' (The dash can't be used as first character (SOLR-3191) for example)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3207) Add field name validation

Posted by "Yonik Seeley (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230422#comment-13230422 ] 

Yonik Seeley commented on SOLR-3207:
------------------------------------

bq. Regarding the trailing characters, do you mean we shouldn't use isJavaIdentifierPart anymore but something else?

That was just a shortcut... looking again, it's pretty open (maybe more open than we want?) esp since unicode changes over time.  Anyway, isJavaIdentifierPart doesn't include "-" or "."  If people do need another separator type character, we could allow "$" too (just not as the first char, since that's taken by variable dereferencing).

bq. That's even more restrictive than my patch since I've used the existing rules applied while parsing the fl parameter (ReturnFields class).

Allowing '-' in the "fl" was just to resolve that "regression" for people who already used fieldnames like that and are upgrading.
If we want to start validating field names strictly, then we should bump the schema version number (and should skip validating when the version number is less than that).

                
> Add field name validation
> -------------------------
>
>                 Key: SOLR-3207
>                 URL: https://issues.apache.org/jira/browse/SOLR-3207
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Luca Cavanna
>             Fix For: 4.0
>
>         Attachments: SOLR-3207.patch
>
>
> Given the SOLR-2444 updated fl syntax and the SOLR-2719 regression, it would be useful to add some kind of validation regarding the field names you can use on Solr.
> The objective would be adding consistency, allowing only field names that you can then use within fl, sorting etc.
> The rules, taken from the actual StrParser behaviour, seem to be the following: 
> - same used for java identifiers (Character#isJavaIdentifierPart), plus the use of trailing '.' and '-'
> - for the first character the rule is Character#isJavaIdentifierStart minus '$' (The dash can't be used as first character (SOLR-3191) for example)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3207) Add field name validation

Posted by "Yonik Seeley (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230287#comment-13230287 ] 

Yonik Seeley commented on SOLR-3207:
------------------------------------

bq. same used for java identifiers (Character#isJavaIdentifierPart), plus the use of trailing '.' and '-'

I think we should prob define it as I documented in the schema:

   <!-- field names should consist of alphanumeric or underscore characters only and
      not start with a digit.  This is not currently strictly enforced,
      but other field names will not have first class support from all components
      and back compatibility is not guaranteed.
   -->

                
> Add field name validation
> -------------------------
>
>                 Key: SOLR-3207
>                 URL: https://issues.apache.org/jira/browse/SOLR-3207
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Luca Cavanna
>             Fix For: 4.0
>
>         Attachments: SOLR-3207.patch
>
>
> Given the SOLR-2444 updated fl syntax and the SOLR-2719 regression, it would be useful to add some kind of validation regarding the field names you can use on Solr.
> The objective would be adding consistency, allowing only field names that you can then use within fl, sorting etc.
> The rules, taken from the actual StrParser behaviour, seem to be the following: 
> - same used for java identifiers (Character#isJavaIdentifierPart), plus the use of trailing '.' and '-'
> - for the first character the rule is Character#isJavaIdentifierStart minus '$' (The dash can't be used as first character (SOLR-3191) for example)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3207) Add field name validation

Posted by "Hoss Man (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230689#comment-13230689 ] 

Hoss Man commented on SOLR-3207:
--------------------------------

the giant elephant in the room that doesn't seem to have been discussed is that trying to validate that field names meet some strict criteria when loading schema.xml doesn't really address dynamic fields -- the patch ensures that <dynamicField name="..."/> configurations have names which are validated, but i don't see anything that would considering the actually field names people use with those dynamic fields -- ie: "*_i" might be a perfectly valid dynamicField at startup, but that startup validation isn't going to help me if i index a document containing the field name "{{$ - foo_i}}"

In general, i'm opposed to the idea of "locking down" what field names can be used across the board.  My preference would be to let people us any field name their heart desires, but provide better documentation on what field name restrictions exist on which features and provide (ie: "using a field name in function requires that the field name match ValidatorX; using a field name in fl requires can only be used with field names conform to ValidatorX and ValidatorY; etc...").

If we want to provide automated "validation" of these things for people, then let's make it part of the LukeRequestHandler: for any field name returned by the LukeRequestHandler, let's include a warnings section advising them which validation rules that field name doesn't match, and what features depend on that validation rule -- this info could then easily be exposed in the admin UI.

We could also provide an optional UpdateProcessor people could configure with a list of individual Validators which could reject any document containing a field that didn't match the validator (or optionally: translate the field name to something thta did conform) to help people enforce these even on dynamic fields.

So by default: any field is allowed, but if i create one with a funky name (either explicitly or as a result of loading data using a dynamicField) the admin UI starts warning me that feature XYZ won't work with fields A, B, C; and if i want to ensure feature D works will all of my fields i add an update processor to ensure it.

                
> Add field name validation
> -------------------------
>
>                 Key: SOLR-3207
>                 URL: https://issues.apache.org/jira/browse/SOLR-3207
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Luca Cavanna
>             Fix For: 4.0
>
>         Attachments: SOLR-3207.patch
>
>
> Given the SOLR-2444 updated fl syntax and the SOLR-2719 regression, it would be useful to add some kind of validation regarding the field names you can use on Solr.
> The objective would be adding consistency, allowing only field names that you can then use within fl, sorting etc.
> The rules, taken from the actual StrParser behaviour, seem to be the following: 
> - same used for java identifiers (Character#isJavaIdentifierPart), plus the use of trailing '.' and '-'
> - for the first character the rule is Character#isJavaIdentifierStart minus '$' (The dash can't be used as first character (SOLR-3191) for example)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3207) Add field name validation

Posted by "Luca Cavanna (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luca Cavanna updated SOLR-3207:
-------------------------------

    Attachment: SOLR-3207.patch

First draft patch. I introduced a new FieldNameValidator class which is used within  the IndexSchema class to validate every field name. The new class exposes also some boolean methods which are used within the ReturnFields class, in order to apply the same rules there to detect a field name. That's needed to make sure that we accept field names that we can handle within the fl parameter.

Apparently, if you use a placeholder as field name you receive on IndexSchema the default value, which can be empty. That's why I'm allowing empty field names. I'm not even sure I understood correctly how placeholders work, can someone help me out with this?

Let me know what you think about my patch!
                
> Add field name validation
> -------------------------
>
>                 Key: SOLR-3207
>                 URL: https://issues.apache.org/jira/browse/SOLR-3207
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Luca Cavanna
>             Fix For: 4.0
>
>         Attachments: SOLR-3207.patch
>
>
> Given the SOLR-2444 updated fl syntax and the SOLR-2719 regression, it would be useful to add some kind of validation regarding the field names you can use on Solr.
> The objective would be adding consistency, allowing only field names that you can then use within fl, sorting etc.
> The rules, taken from the actual StrParser behaviour, seem to be the following: 
> - same used for java identifiers (Character#isJavaIdentifierPart), plus the use of trailing '.' and '-'
> - for the first character the rule is Character#isJavaIdentifierStart minus '$' (The dash can't be used as first character (SOLR-3191) for example)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org